You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Rush Manbert (JIRA)" <ji...@apache.org> on 2009/05/06 01:32:30 UTC

[jira] Created: (THRIFT-488) ThreadManager crashing bugs

ThreadManager crashing bugs
---------------------------

                 Key: THRIFT-488
                 URL: https://issues.apache.org/jira/browse/THRIFT-488
             Project: Thrift
          Issue Type: Bug
          Components: Library (C++)
    Affects Versions: 0.1
         Environment: Mac OS X 10.5.6, Xcode
            Reporter: Rush Manbert
         Attachments: ThreadManager.cpp.patch

The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.

These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).

Test procedure:
1) Apply the patch from THRIFT-487 for the Tests.cpp file.
2) Run make in lib/cpp in order to rebuild concurrency_test
3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
5) Run make in lib/cpp
6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
7) Apply the attached patch file for ThreadManager.cpp
8) Run make in lib/cpp
9) Run concurrency_test as before. It should just run, and run, and run.

Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
for (size_t ix = 0; ix < value + 1; ix++)
so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706639#action_12706639 ] 

Rush Manbert commented on THRIFT-488:
-------------------------------------

Okay, that doesn't work either because in order to access the object pointed to by the weak_ptr, you need to convert it to a shared_ptr. We do that, then the Worker finds that he should be idle, so he waits forever on the monitor_ mutex. Because the worker holds a shared_ptr, the manager's use count never goes to zero and his destructor never gets called.

I haven't used weak_ptr before, and there is a chance that I messed up, so I am going to add a patch that converts an unmodified ThreadManager.cpp to my version that uses weak_ptr. As always, you need to apply THRIFT-487 to the test code in order to use it.

I had another idea, but it sort of smells bad. If all of the shared resources were in an object (a structure) allocated on the heap and all the players had a shared_ptr to it, then none of the mutex variables would disappear. We would also need the manager to set a flag when he goes away, so we won't leak Workers and tasks. And we would need to protect access to the shared structure. I hate the idea of adding another mutex.

Thoughts?

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Posted by "Marc Kwiatkowski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706325#action_12706325 ] 

Marc Kwiatkowski commented on THRIFT-488:
-----------------------------------------

Rush:

I need to go through this in detail but I'm not sure this is the right approach to the problem.  I wonder if it makes more sense to wrap the Worker threads' references to ThreadManager with a boost:weak_ptr/boost::shared_ptr.  Currently the ThreadManager dtor calls stop(false) ie don't block for worker threads.  My original thinking was that if the client didn't want the ThreadManager to hang around waiting for potentially stuck worker threads, it shouldn't, but as you note that leads to references to dtor'ed objects.

Switching to weak_ptr allows ThreadManager's dtor to keep the best effort stop and still let active threads clean up properly after the primary reference to ThreadManager is dropped.

Let me know what you think.

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rush Manbert updated THRIFT-488:
--------------------------------

    Attachment: ThreadManagerAssertPatch.txt

For testing, a patch that just adds some asserts to ThreadManager.cpp. See my last comment for more detail.

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rush Manbert updated THRIFT-488:
--------------------------------

    Attachment: ThreadManagerWeakPtr.txt

Patch to make ThreadManager::Impl and ThreadManager:Worker use weak_ptr. See my previous comment for more info.

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt, ThreadManagerWeakPtr.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706525#action_12706525 ] 

Rush Manbert commented on THRIFT-488:
-------------------------------------

What I see happening in stop() is that it always calls removeWorker(), which blocks waiting for workerCount_ to decrement sufficiently, so I'm not sure that you're achieving your goal.

Regardless, you think we could keep the current ThreadManager implementation, but pass a shared_ptr (that seems cleaner than a weak_ptr, since no testing will be required before attempting to acquire locks) rather than a raw pointer? I see that we can guarantee that every ThreadManager instance is owned by a shared_ptr, so that's safe. I can give this a try.

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rush Manbert updated THRIFT-488:
--------------------------------

    Attachment: ThreadManager.cpp.patch

Patch for ThreadManager.cpp

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (THRIFT-488) ThreadManager crashing bugs

Posted by "Rush Manbert (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/THRIFT-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706570#action_12706570 ] 

Rush Manbert commented on THRIFT-488:
-------------------------------------

Doh! Can't use shared_ptr because then the ThreadManager::Impl destructor is never called, so stop() is never called, so nothing is ever deleted and you run out of resources after a while.

I could look into using weak_ptr. I think it would work, but it's going to mask a problem. I will atach a new patch file called ThreadManagerAssertPatch.txt. If you apply it to an unmodified ThreadManager.cpp and apply the test patches from THRIFT-487 you will find that it asserts at line 412 in ThreadManager.cpp. If you remove that assert, it will start asserting at line 417, and if you remove that assert it will start asserting at line 377.

This is because there is a race which I'll try to describe. I will use line numbers from the patched version of ThreadManager.cpp.
1) Manager notifies all idle workers at line 405
2) Workers waiting at line 248 wake up. They are active, but the manager's task queue is empty, so they nark themselves as idle and decrement the worker count. In our case, the count was one and now it is zero, which is equal to workerMaxCount_.
3) Sometime before the worker can add himself to the deadWorkers queue, and sometimes before he even atempts to get the workerMonitor_ mutex, the manager runs again.
4) The manager tests workerCount_ against workerMaxCount_ at line 413 and finds that they are equal, so he does not wait. He goes on to the loop at line 418 and does nothing because deadWorkers is empty. The thing that hapens the most presently is that the manager completes his destructor call, then the worker wakes up and tries to get the mutex at line 288 and things go sour.

If we add the weak_ptr, then we can factor it into isActive(), and we can also check it outside of the scope that starts at line 287. But that masks the race condition, it doesn't fix it. My original patch was intended to fix the race conditions, including the strange case where a Worker starts but finds that he is not active.

I'll do a version that passes a weak_ptr and post my results.

> ThreadManager crashing bugs
> ---------------------------
>
>                 Key: THRIFT-488
>                 URL: https://issues.apache.org/jira/browse/THRIFT-488
>             Project: Thrift
>          Issue Type: Bug
>          Components: Library (C++)
>    Affects Versions: 0.1
>         Environment: Mac OS X 10.5.6, Xcode
>            Reporter: Rush Manbert
>         Attachments: ThreadManager.cpp.patch, ThreadManagerAssertPatch.txt
>
>
> The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.
> These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).
> Test procedure:
> 1) Apply the patch from THRIFT-487 for the Tests.cpp file.
> 2) Run make in lib/cpp in order to rebuild concurrency_test
> 3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
> 4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
> 5) Run make in lib/cpp
> 6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
> 7) Apply the attached patch file for ThreadManager.cpp
> 8) Run make in lib/cpp
> 9) Run concurrency_test as before. It should just run, and run, and run.
> Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
> for (size_t ix = 0; ix < value + 1; ix++)
> so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.