You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Stefan <lu...@gmx.de> on 2017/01/31 23:23:51 UTC

[PATCH] deadlocking condition with APR_HAS_THREADS

Hi,

the issue was discovered as part of tracing down a deadlock condition in
an SVN test [1].

As far as I understand it, the problem lies in the C-standard not
explicitly defining when a function registered with atexit() is called
in the context of thread termination [2].

The case is reproducible with the following code inside a DLL and
executing that from a calling process (a complete repro project is
available on my issue tracker [1]):

    atexit(apr_terminate);
    apr_initialize();

    apr_pool_t *pool;
    apr_pool_create_ex(&pool, NULL, NULL, NULL);
    apr_thread_pool_t *thread_pool;
    apr_thread_pool_create(&thread_pool, 0, 16, pool);
    apr_thread_pool_idle_wait_set(thread_pool, 1000 * 1000);

    for (int i = 0; i < 3; ++i) {
      apr_thread_pool_push(thread_pool, foo, nullptr, 0, NULL);
    }

In this particular case we create a couple of threads and before the
threads terminate the calling process terminates. At this point
apr_terminate() is called which got registered via the atexit()-call (as
suggested in the apr_terminate-documentation). When that call frees up
the threadpool, it gets stuck in thread_pool_cleanup() waiting for the
thd_cnt to become 0 (which it never will).

I believe the attached patch (done against the apr-util 1.5 branch)
should solve the underlying problem. It determines the number of failed
thread joins (i.e. those which were either detached or did not exit
correctly) and reduces the thd_cnt variable by that amount.

The test case I used on Windows is fixed with that patch and I'll run
the full apr-test suite later today (if I don't get back to this mail it
means the test suite passed for me with that patch).

I also reviewed the implementation of apr_thread_join() on other
platforms and could not spot any immediate problems the patch would
cause on these platforms. However, since I'm quite new to the APR
development and also am not have only marginal knowledge with platforms
other than Windows, I'd appreciate if someone knowing the details about
APR's thread design and who is more familiar with other platforms could
review the patch/approach before I commit the patch.


Regards,
Stefan

[1] http://www.luke1410.de:8090/browse/MAXSVN-94
[2]
https://stackoverflow.com/questions/39655868/what-does-the-posix-standard-say-about-thread-stacks-in-atexit-handlers-what

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan Hett <st...@egosoft.com>.
On 2/2/2017 1:25 PM, Branko \u010cibej wrote:
> On 02.02.2017 13:14, Stefan Hett wrote:
>> On 2/2/2017 12:52 PM, Branko \u010cibej wrote:
>>> On 01.02.2017 00:23, Stefan wrote:
>>>> Hi,
>>>>
>>>> the issue was discovered as part of tracing down a deadlock
>>>> condition in
>>>> an SVN test [1].
>>>>
>>>> [...]
>>>>
>>>> In this particular case we create a couple of threads and before the
>>>> threads terminate the calling process terminates. At this point
>>>> apr_terminate() is called which got registered via the atexit()-call
>>>> (as
>>>> suggested in the apr_terminate-documentation).
>>> This is incorrect. apr_terminate() is not called after the calling
>>> process terminates; that would make no sense at all: if there's no
>>> process, it cannot execute any code.
>>>
>>> Handlers registered with atexit() are called "when the program
>>> terminates normally": either when exit() is called, or when main()
>>> returns, and certainly _before_ the process terminates.
>>> [...]
>>>
>> Another example of my bad expression and usage of incorrect
>> terminology. Sorry for that.
>>> In this particular case we create a couple of threads and before the
>>> threads terminate the calling process terminates.
>> This should actually read:
>> In this particular case we create a couple of threads and before these
>> threads finish (i.e. before the threads get to return from their
>> thread_pool_func() function) the process' main function returns.
>>
>> Hope this makes a bit more sense now. :-)
> It does and there's nothing wrong with that, that's perfectly normal and
> expected.
>
Right, but what's not expected is that the threads terminated without 
actually getting to the return statements in thread_pool_func() and 
hence did not decrease the thd_cnt-variable, or is it? In that case I'm 
really missing something.

-- 
Regards,
Stefan Hett


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 02.02.2017 13:14, Stefan Hett wrote:
> On 2/2/2017 12:52 PM, Branko \u010cibej wrote:
>> On 01.02.2017 00:23, Stefan wrote:
>>> Hi,
>>>
>>> the issue was discovered as part of tracing down a deadlock
>>> condition in
>>> an SVN test [1].
>>>
>>> [...]
>>>
>>> In this particular case we create a couple of threads and before the
>>> threads terminate the calling process terminates. At this point
>>> apr_terminate() is called which got registered via the atexit()-call
>>> (as
>>> suggested in the apr_terminate-documentation).
>> This is incorrect. apr_terminate() is not called after the calling
>> process terminates; that would make no sense at all: if there's no
>> process, it cannot execute any code.
>>
>> Handlers registered with atexit() are called "when the program
>> terminates normally": either when exit() is called, or when main()
>> returns, and certainly _before_ the process terminates.
>> [...]
>>
> Another example of my bad expression and usage of incorrect
> terminology. Sorry for that.
>> In this particular case we create a couple of threads and before the
>> threads terminate the calling process terminates.
> This should actually read:
> In this particular case we create a couple of threads and before these
> threads finish (i.e. before the threads get to return from their
> thread_pool_func() function) the process' main function returns.
>
> Hope this makes a bit more sense now. :-)

It does and there's nothing wrong with that, that's perfectly normal and
expected.

-- Brane


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan Hett <st...@egosoft.com>.
On 2/2/2017 12:52 PM, Branko \u010cibej wrote:
> On 01.02.2017 00:23, Stefan wrote:
>> Hi,
>>
>> the issue was discovered as part of tracing down a deadlock condition in
>> an SVN test [1].
>>
>> [...]
>>
>> In this particular case we create a couple of threads and before the
>> threads terminate the calling process terminates. At this point
>> apr_terminate() is called which got registered via the atexit()-call (as
>> suggested in the apr_terminate-documentation).
> This is incorrect. apr_terminate() is not called after the calling
> process terminates; that would make no sense at all: if there's no
> process, it cannot execute any code.
>
> Handlers registered with atexit() are called "when the program
> terminates normally": either when exit() is called, or when main()
> returns, and certainly _before_ the process terminates.
> [...]
>
Another example of my bad expression and usage of incorrect terminology. 
Sorry for that.
> In this particular case we create a couple of threads and before the
> threads terminate the calling process terminates.
This should actually read:
In this particular case we create a couple of threads and before these 
threads finish (i.e. before the threads get to return from their 
thread_pool_func() function) the process' main function returns.

Hope this makes a bit more sense now. :-)

Of cause you are absolutely correct that the process itself is still 
running at that point.

-- 
Regards,
Stefan Hett, Developer/Administrator

EGOSOFT GmbH, Heidestrasse 4, 52146 W�rselen, Germany
Tel: +49 2405 4239970, www.egosoft.com
Gesch�ftsf�hrer: Bernd Lehahn, Handelsregister Aachen HRB 13473


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 01.02.2017 00:23, Stefan wrote:
> Hi,
>
> the issue was discovered as part of tracing down a deadlock condition in
> an SVN test [1].
>
> As far as I understand it, the problem lies in the C-standard not
> explicitly defining when a function registered with atexit() is called
> in the context of thread termination [2].
>
> The case is reproducible with the following code inside a DLL and
> executing that from a calling process (a complete repro project is
> available on my issue tracker [1]):
>
>     atexit(apr_terminate);
>     apr_initialize();
>
>     apr_pool_t *pool;
>     apr_pool_create_ex(&pool, NULL, NULL, NULL);
>     apr_thread_pool_t *thread_pool;
>     apr_thread_pool_create(&thread_pool, 0, 16, pool);
>     apr_thread_pool_idle_wait_set(thread_pool, 1000 * 1000);
>
>     for (int i = 0; i < 3; ++i) {
>       apr_thread_pool_push(thread_pool, foo, nullptr, 0, NULL);
>     }
>
> In this particular case we create a couple of threads and before the
> threads terminate the calling process terminates. At this point
> apr_terminate() is called which got registered via the atexit()-call (as
> suggested in the apr_terminate-documentation).

This is incorrect. apr_terminate() is not called after the calling
process terminates; that would make no sense at all: if there's no
process, it cannot execute any code.

Handlers registered with atexit() are called "when the program
terminates normally": either when exit() is called, or when main()
returns, and certainly _before_ the process terminates.

Windows' documentation of atexit has this to say about shared libraries:

    The code in the atexit function should not contain any dependency on
    any DLL which could have already been unloaded when the atexit
    function is called.


This is what might be happening, if Subversion is dynamically loading
the shared library that contains the thread pool code.

-- Brane

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 02.02.2017 12:04, Nick Kew wrote:
> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>> Hi,
>>
>> the issue was discovered as part of tracing down a deadlock condition in
>> an SVN test [1].
>>
>> As far as I understand it, the problem lies in the C-standard not
>> explicitly defining when a function registered with atexit() is called
>> in the context of thread termination [2].
> I had no idea atexit() was called on thread termination.

It's not.

>   I guess the
> manpage must be misleading in telling us it happens at process exit? 
>
>>     atexit(apr_terminate);
>>     apr_initialize();
> That whole widely-used idiom looks deeply suspect if multiple
> thread exits will each invoke the atexit!  Are you sure this
> isn't just a platform bug?  In which case, a patch should perhaps
> live in platform-specific code?

That answer in SO is nether here nor there. In C (and C++), atexit()
handlers are called when the main() function terminates; either duet to
a call to exit(), or due to an implicit or explicit return from main().
And the "Linux" function _exit() is something else than the standard C
exit() and is closer in semantics to standard C _Exit().

I don't have a recent edition of the C standard available; I do know
that C90 has nothing to say about threads. The C++11  standard does
describe what happens on thread termination and says nothing about
atexit handlers being called. I'd be surprised if C++ behaviour was
different from C in this case.

-- Brane

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan <lu...@posteo.de>.
On 2/9/2017 08:17, Branko Čibej wrote:
> On 09.02.2017 03:19, William A Rowe Jr wrote:
>> If you don't re-join your threads to the parent thread, I can't say I
>> believe that this represents a bug. But that's just my 2c from the original
>> architecture.
> The problem is that on Windows, threads have been killed before
> apr_thread_join() is called. That causes apr_thread_join() to return
> APR_INCOMPLETE, and the thread pool cleanup code does not account for that.
>
> The long story is that we recommend the following code to initialize and
> clean up APR:
>
>     atexit(apr_terminate);
>     apr_initialize();
>
> The thread pool constructor registers a pre-cleanup handler so
> theoretically, when apr_terminate() is called, the worker threads will
> be join()'ed. But on Windows, threads are killed before atexit handlers
> are called, causing the thread pool destructor to wait forever for
> threads that are in fact not live any more.
>
>
> I'd say the bug is in the thread pool destructor since it does not
> behave correctly on one of the platforms it's supposed to work on.
>
> -- Brane
Maybe things are getting easier, if we don't call this a bug on APR but
rather see it as a feature improvement of APR to make it easier for
users of APR to use the documented (and established) way to clean up the
APR resources even in light of threads having been killed prior to
apr_terminate() having been called.

What do you think, William, would that make it sound like a reasonable
improvement to APR?

Regards,
Stefan

>> On Feb 8, 2017 4:30 PM, "Stefan" <lu...@posteo.de> wrote:
>>
>>> On 2/2/2017 14:45, Branko Čibej wrote:
>>>> On 02.02.2017 13:30, Stefan Hett wrote:
>>>>> On 2/2/2017 1:24 PM, Branko Čibej wrote:
>>>>>> On 02.02.2017 12:49, Stefan Hett wrote:
>>>>>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>>>>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> the issue was discovered as part of tracing down a deadlock
>>>>>>>>> condition in
>>>>>>>>> an SVN test [1].
>>>>>>>>>
>>>>>>>>> As far as I understand it, the problem lies in the C-standard not
>>>>>>>>> explicitly defining when a function registered with atexit() is
>>>>>>>>> called
>>>>>>>>> in the context of thread termination [2].
>>>>>>>> I had no idea atexit() was called on thread termination.  I guess the
>>>>>>>> manpage must be misleading in telling us it happens at process exit?
>>>>>>> I expressed myself poorly a bit here. I didn't want to suggest
>>>>>>> atexit()-registered functions are called on thread termination but
>>>>>>> rather that threads can be terminated before the atexit-registered
>>>>>>> functions are called.
>>>>>>> So if apr_terminate is registered in a function called from a DLL via
>>>>>>> atexit(), it can happen (and in my scenario does happen) that threads
>>>>>>> got terminated before apr_terminate() is called.
>>>>>> But that should not matter. apr_thread_join() does not terminate a
>>>>>> thread, it waits for the thread to terminate. The thread handle should
>>>>>> remain valid until apr_thread_join() is called, even if the thread
>>>>>> terminated before the join().
>>>>>>
>>>>> This is correct. The problem is that thd_cnt is not decremented
>>>>> (thd->td is valid, apr_thread_join() waits until the thread is (which
>>>>> it is already) and then returns APR_INCOMPLETE - nothing decrements
>>>>> the thd_cnt value for the already terminated thread).
>>>> It turns out that this is a bug in APR's thread pool code, and has been
>>>> fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
>>>> apr_thread_join() on Windows in this case, as far as I can interpret the
>>>> code.
>>>>
>>> As it turned out, the issue is actually still present on trunk. The full
>>> details behind the problem are summed up in a blog post now:
>>> http://www.luke1410.de/blog/?p=95
>>>
>>> What do you (or the others) think. Does this warrant the patch being
>>> applied? I'd really not feel too good with leaving that issue unresolved
>>> in APR, given there's a fix available; but I'd certainly not commit it,
>>> if it's not getting support by the established community/committers.
>>>
>>> If someone things the patch needs to be improved (or a different
>>> approach to the problem should be taken), I'm certainly offering to come
>>> up with alternatives.
>>>
>>> Regards,
>>> Stefan
>>>



Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 09.02.2017 03:19, William A Rowe Jr wrote:
> If you don't re-join your threads to the parent thread, I can't say I
> believe that this represents a bug. But that's just my 2c from the original
> architecture.

The problem is that on Windows, threads have been killed before
apr_thread_join() is called. That causes apr_thread_join() to return
APR_INCOMPLETE, and the thread pool cleanup code does not account for that.

The long story is that we recommend the following code to initialize and
clean up APR:

    atexit(apr_terminate);
    apr_initialize();

The thread pool constructor registers a pre-cleanup handler so
theoretically, when apr_terminate() is called, the worker threads will
be join()'ed. But on Windows, threads are killed before atexit handlers
are called, causing the thread pool destructor to wait forever for
threads that are in fact not live any more.


I'd say the bug is in the thread pool destructor since it does not
behave correctly on one of the platforms it's supposed to work on.

-- Brane

> On Feb 8, 2017 4:30 PM, "Stefan" <lu...@posteo.de> wrote:
>
>> On 2/2/2017 14:45, Branko \u010cibej wrote:
>>> On 02.02.2017 13:30, Stefan Hett wrote:
>>>> On 2/2/2017 1:24 PM, Branko \u010cibej wrote:
>>>>> On 02.02.2017 12:49, Stefan Hett wrote:
>>>>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>>>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the issue was discovered as part of tracing down a deadlock
>>>>>>>> condition in
>>>>>>>> an SVN test [1].
>>>>>>>>
>>>>>>>> As far as I understand it, the problem lies in the C-standard not
>>>>>>>> explicitly defining when a function registered with atexit() is
>>>>>>>> called
>>>>>>>> in the context of thread termination [2].
>>>>>>> I had no idea atexit() was called on thread termination.  I guess the
>>>>>>> manpage must be misleading in telling us it happens at process exit?
>>>>>> I expressed myself poorly a bit here. I didn't want to suggest
>>>>>> atexit()-registered functions are called on thread termination but
>>>>>> rather that threads can be terminated before the atexit-registered
>>>>>> functions are called.
>>>>>> So if apr_terminate is registered in a function called from a DLL via
>>>>>> atexit(), it can happen (and in my scenario does happen) that threads
>>>>>> got terminated before apr_terminate() is called.
>>>>> But that should not matter. apr_thread_join() does not terminate a
>>>>> thread, it waits for the thread to terminate. The thread handle should
>>>>> remain valid until apr_thread_join() is called, even if the thread
>>>>> terminated before the join().
>>>>>
>>>> This is correct. The problem is that thd_cnt is not decremented
>>>> (thd->td is valid, apr_thread_join() waits until the thread is (which
>>>> it is already) and then returns APR_INCOMPLETE - nothing decrements
>>>> the thd_cnt value for the already terminated thread).
>>> It turns out that this is a bug in APR's thread pool code, and has been
>>> fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
>>> apr_thread_join() on Windows in this case, as far as I can interpret the
>>> code.
>>>
>> As it turned out, the issue is actually still present on trunk. The full
>> details behind the problem are summed up in a blog post now:
>> http://www.luke1410.de/blog/?p=95
>>
>> What do you (or the others) think. Does this warrant the patch being
>> applied? I'd really not feel too good with leaving that issue unresolved
>> in APR, given there's a fix available; but I'd certainly not commit it,
>> if it's not getting support by the established community/committers.
>>
>> If someone things the patch needs to be improved (or a different
>> approach to the problem should be taken), I'm certainly offering to come
>> up with alternatives.
>>
>> Regards,
>> Stefan
>>


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by William A Rowe Jr <wr...@rowe-clan.net>.
If you don't re-join your threads to the parent thread, I can't say I
believe that this represents a bug. But that's just my 2c from the original
architecture.


On Feb 8, 2017 4:30 PM, "Stefan" <lu...@posteo.de> wrote:

> On 2/2/2017 14:45, Branko Čibej wrote:
> > On 02.02.2017 13:30, Stefan Hett wrote:
> >> On 2/2/2017 1:24 PM, Branko Čibej wrote:
> >>> On 02.02.2017 12:49, Stefan Hett wrote:
> >>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
> >>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> the issue was discovered as part of tracing down a deadlock
> >>>>>> condition in
> >>>>>> an SVN test [1].
> >>>>>>
> >>>>>> As far as I understand it, the problem lies in the C-standard not
> >>>>>> explicitly defining when a function registered with atexit() is
> >>>>>> called
> >>>>>> in the context of thread termination [2].
> >>>>> I had no idea atexit() was called on thread termination.  I guess the
> >>>>> manpage must be misleading in telling us it happens at process exit?
> >>>> I expressed myself poorly a bit here. I didn't want to suggest
> >>>> atexit()-registered functions are called on thread termination but
> >>>> rather that threads can be terminated before the atexit-registered
> >>>> functions are called.
> >>>> So if apr_terminate is registered in a function called from a DLL via
> >>>> atexit(), it can happen (and in my scenario does happen) that threads
> >>>> got terminated before apr_terminate() is called.
> >>> But that should not matter. apr_thread_join() does not terminate a
> >>> thread, it waits for the thread to terminate. The thread handle should
> >>> remain valid until apr_thread_join() is called, even if the thread
> >>> terminated before the join().
> >>>
> >> This is correct. The problem is that thd_cnt is not decremented
> >> (thd->td is valid, apr_thread_join() waits until the thread is (which
> >> it is already) and then returns APR_INCOMPLETE - nothing decrements
> >> the thd_cnt value for the already terminated thread).
> >
> > It turns out that this is a bug in APR's thread pool code, and has been
> > fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
> > apr_thread_join() on Windows in this case, as far as I can interpret the
> > code.
> >
> As it turned out, the issue is actually still present on trunk. The full
> details behind the problem are summed up in a blog post now:
> http://www.luke1410.de/blog/?p=95
>
> What do you (or the others) think. Does this warrant the patch being
> applied? I'd really not feel too good with leaving that issue unresolved
> in APR, given there's a fix available; but I'd certainly not commit it,
> if it's not getting support by the established community/committers.
>
> If someone things the patch needs to be improved (or a different
> approach to the problem should be taken), I'm certainly offering to come
> up with alternatives.
>
> Regards,
> Stefan
>

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan <lu...@posteo.de>.
On 2/2/2017 14:45, Branko \u010cibej wrote:
> On 02.02.2017 13:30, Stefan Hett wrote:
>> On 2/2/2017 1:24 PM, Branko \u010cibej wrote:
>>> On 02.02.2017 12:49, Stefan Hett wrote:
>>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>>>> Hi,
>>>>>>
>>>>>> the issue was discovered as part of tracing down a deadlock
>>>>>> condition in
>>>>>> an SVN test [1].
>>>>>>
>>>>>> As far as I understand it, the problem lies in the C-standard not
>>>>>> explicitly defining when a function registered with atexit() is
>>>>>> called
>>>>>> in the context of thread termination [2].
>>>>> I had no idea atexit() was called on thread termination.  I guess the
>>>>> manpage must be misleading in telling us it happens at process exit?
>>>> I expressed myself poorly a bit here. I didn't want to suggest
>>>> atexit()-registered functions are called on thread termination but
>>>> rather that threads can be terminated before the atexit-registered
>>>> functions are called.
>>>> So if apr_terminate is registered in a function called from a DLL via
>>>> atexit(), it can happen (and in my scenario does happen) that threads
>>>> got terminated before apr_terminate() is called.
>>> But that should not matter. apr_thread_join() does not terminate a
>>> thread, it waits for the thread to terminate. The thread handle should
>>> remain valid until apr_thread_join() is called, even if the thread
>>> terminated before the join().
>>>
>> This is correct. The problem is that thd_cnt is not decremented
>> (thd->td is valid, apr_thread_join() waits until the thread is (which
>> it is already) and then returns APR_INCOMPLETE - nothing decrements
>> the thd_cnt value for the already terminated thread).
>
> It turns out that this is a bug in APR's thread pool code, and has been
> fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
> apr_thread_join() on Windows in this case, as far as I can interpret the
> code.
>
As it turned out, the issue is actually still present on trunk. The full
details behind the problem are summed up in a blog post now:
http://www.luke1410.de/blog/?p=95

What do you (or the others) think. Does this warrant the patch being
applied? I'd really not feel too good with leaving that issue unresolved
in APR, given there's a fix available; but I'd certainly not commit it,
if it's not getting support by the established community/committers.

If someone things the patch needs to be improved (or a different
approach to the problem should be taken), I'm certainly offering to come
up with alternatives.

Regards,
Stefan

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 02.02.2017 13:30, Stefan Hett wrote:
> On 2/2/2017 1:24 PM, Branko \u010cibej wrote:
>> On 02.02.2017 12:49, Stefan Hett wrote:
>>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>>> Hi,
>>>>>
>>>>> the issue was discovered as part of tracing down a deadlock
>>>>> condition in
>>>>> an SVN test [1].
>>>>>
>>>>> As far as I understand it, the problem lies in the C-standard not
>>>>> explicitly defining when a function registered with atexit() is
>>>>> called
>>>>> in the context of thread termination [2].
>>>> I had no idea atexit() was called on thread termination.  I guess the
>>>> manpage must be misleading in telling us it happens at process exit?
>>> I expressed myself poorly a bit here. I didn't want to suggest
>>> atexit()-registered functions are called on thread termination but
>>> rather that threads can be terminated before the atexit-registered
>>> functions are called.
>>> So if apr_terminate is registered in a function called from a DLL via
>>> atexit(), it can happen (and in my scenario does happen) that threads
>>> got terminated before apr_terminate() is called.
>> But that should not matter. apr_thread_join() does not terminate a
>> thread, it waits for the thread to terminate. The thread handle should
>> remain valid until apr_thread_join() is called, even if the thread
>> terminated before the join().
>>
> This is correct. The problem is that thd_cnt is not decremented
> (thd->td is valid, apr_thread_join() waits until the thread is (which
> it is already) and then returns APR_INCOMPLETE - nothing decrements
> the thd_cnt value for the already terminated thread).


It turns out that this is a bug in APR's thread pool code, and has been
fixed on trunk. APR_INCOMPLETE is the _expected_ return code from
apr_thread_join() on Windows in this case, as far as I can interpret the
code.


-- Brane


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan Hett <st...@egosoft.com>.
On 2/2/2017 1:24 PM, Branko \u010cibej wrote:
> On 02.02.2017 12:49, Stefan Hett wrote:
>> On 2/2/2017 12:04 PM, Nick Kew wrote:
>>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>>> Hi,
>>>>
>>>> the issue was discovered as part of tracing down a deadlock
>>>> condition in
>>>> an SVN test [1].
>>>>
>>>> As far as I understand it, the problem lies in the C-standard not
>>>> explicitly defining when a function registered with atexit() is called
>>>> in the context of thread termination [2].
>>> I had no idea atexit() was called on thread termination.  I guess the
>>> manpage must be misleading in telling us it happens at process exit?
>> I expressed myself poorly a bit here. I didn't want to suggest
>> atexit()-registered functions are called on thread termination but
>> rather that threads can be terminated before the atexit-registered
>> functions are called.
>> So if apr_terminate is registered in a function called from a DLL via
>> atexit(), it can happen (and in my scenario does happen) that threads
>> got terminated before apr_terminate() is called.
> But that should not matter. apr_thread_join() does not terminate a
> thread, it waits for the thread to terminate. The thread handle should
> remain valid until apr_thread_join() is called, even if the thread
> terminated before the join().
>
This is correct. The problem is that thd_cnt is not decremented (thd->td 
is valid, apr_thread_join() waits until the thread is (which it is 
already) and then returns APR_INCOMPLETE - nothing decrements the 
thd_cnt value for the already terminated thread).

-- 
Regards,
Stefan Hett


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Branko Čibej <br...@apache.org>.
On 02.02.2017 12:49, Stefan Hett wrote:
> On 2/2/2017 12:04 PM, Nick Kew wrote:
>> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>>> Hi,
>>>
>>> the issue was discovered as part of tracing down a deadlock
>>> condition in
>>> an SVN test [1].
>>>
>>> As far as I understand it, the problem lies in the C-standard not
>>> explicitly defining when a function registered with atexit() is called
>>> in the context of thread termination [2].
>> I had no idea atexit() was called on thread termination.  I guess the
>> manpage must be misleading in telling us it happens at process exit?
> I expressed myself poorly a bit here. I didn't want to suggest
> atexit()-registered functions are called on thread termination but
> rather that threads can be terminated before the atexit-registered
> functions are called.
> So if apr_terminate is registered in a function called from a DLL via
> atexit(), it can happen (and in my scenario does happen) that threads
> got terminated before apr_terminate() is called.

But that should not matter. apr_thread_join() does not terminate a
thread, it waits for the thread to terminate. The thread handle should
remain valid until apr_thread_join() is called, even if the thread
terminated before the join().

-- Brane

Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan Hett <st...@egosoft.com>.
On 2/2/2017 12:04 PM, Nick Kew wrote:
> On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
>> Hi,
>>
>> the issue was discovered as part of tracing down a deadlock condition in
>> an SVN test [1].
>>
>> As far as I understand it, the problem lies in the C-standard not
>> explicitly defining when a function registered with atexit() is called
>> in the context of thread termination [2].
> I had no idea atexit() was called on thread termination.  I guess the
> manpage must be misleading in telling us it happens at process exit?
I expressed myself poorly a bit here. I didn't want to suggest 
atexit()-registered functions are called on thread termination but 
rather that threads can be terminated before the atexit-registered 
functions are called.
So if apr_terminate is registered in a function called from a DLL via 
atexit(), it can happen (and in my scenario does happen) that threads 
got terminated before apr_terminate() is called.

Here's the code from the referenced repro-project mentioned in the 
initial mail:

TESTDLL_API int fntestdll(void)
{
     atexit(apr_terminate);
     apr_initialize();

     apr_pool_t *pool;
     apr_pool_create_ex(&pool, NULL, NULL, NULL);
     apr_thread_pool_t *thread_pool;
     apr_thread_pool_create(&thread_pool, 0, 16, pool);
     apr_thread_pool_idle_wait_set(thread_pool, 1000 * 1000);

     for (int i = 0; i < 3; ++i) {
       apr_thread_pool_push(thread_pool, foo, nullptr, 0, NULL);
     }
     return 0;
}

fntestdll is then called from the main-function in the test-application.

In this case it appears that the threads terminate first, before 
apr_terminate() is executed.
I'm not sure whether this behavior is Windows-specific or not, but from 
what I read/learned on the topic so far on the web, it seems that it's 
outside the scope of the standard to say anything about DLLs and shared 
library handling in this context.

While this particular issue could be prevented in other ways (including 
platform specific solutions), I still think that the actual issue of the 
current implementation is that trimming idle-threads does not handle 
already detached/interrupted/terminated threads (while apr_join_thread's 
return value actually indicates that to the caller. Hence the most 
reasonable solution to this problem seemed to be the proposed patch.

-- 
Regards,
Stefan Hett


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Nick Kew <ni...@apache.org>.
On Wed, 2017-02-01 at 00:23 +0100, Stefan wrote:
> Hi,
> 
> the issue was discovered as part of tracing down a deadlock condition in
> an SVN test [1].
> 
> As far as I understand it, the problem lies in the C-standard not
> explicitly defining when a function registered with atexit() is called
> in the context of thread termination [2].

I had no idea atexit() was called on thread termination.  I guess the
manpage must be misleading in telling us it happens at process exit? 

>     atexit(apr_terminate);
>     apr_initialize();

That whole widely-used idiom looks deeply suspect if multiple
thread exits will each invoke the atexit!  Are you sure this
isn't just a platform bug?  In which case, a patch should perhaps
live in platform-specific code?

-- 
Nick Kew


Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Stefan <lu...@gmx.de>.
On 2/1/2017 09:58, Ivan Zhakov wrote:
> On 1 February 2017 at 02:23, Stefan <lu...@gmx.de> wrote:
>> Hi,
>>
> [...]
>
>> The case is reproducible with the following code inside a DLL and
>> executing that from a calling process (a complete repro project is
>> available on my issue tracker [1]):
> You have to wait callbacks to finish if thread pool callback resides
> in DLL, because it could be unloaded before APR will destroy
> threadpool.
>
>
I'm having difficulties following you here. Note that all threads
already terminated (the idle threads processed in trim_idle_threads got
terminated without cleanly exiting the thread_pool_func() function and
therefore did not decrement the thread-counter). Am I missing something?

Regards,
Stefan



Re: [PATCH] deadlocking condition with APR_HAS_THREADS

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 1 February 2017 at 02:23, Stefan <lu...@gmx.de> wrote:
> Hi,
>
[...]

> The case is reproducible with the following code inside a DLL and
> executing that from a calling process (a complete repro project is
> available on my issue tracker [1]):
You have to wait callbacks to finish if thread pool callback resides
in DLL, because it could be unloaded before APR will destroy
threadpool.


-- 
Ivan Zhakov