You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Max Barry <ma...@maxbarry.com> on 2011/11/14 04:42:16 UTC

Re: Apache Children Stuck on futex

On 29/10/11 22:43, Torsten Förtsch wrote:
> On Wednesday, 26 October 2011 05:56:49 Max Barry wrote:
>> $ strace -p 24133
>> Process 24133 attached - interrupt to quit
>> read(5, "!", 1)                         = 1
>> tgkill(24133, 24164, SIGHUP)            = 0
>> tgkill(24133, 24164, SIG_0)             = 0
>> --- SIGTERM (Terminated) @ 0 (0) ---
>> rt_sigreturn(0xf)                       = 0
>> select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
>> tgkill(24133, 24140, SIGUSR1)           = 0
>> futex(0x7f9904f4e9d0, FUTEX_WAIT, 24140, NULL
> 
> It would be interesting to see which futex it is blocked on. One way to 
> check that is perhaps to allow core dumps in the apache config and then 
> to send a core dump signal like SEGV, BUS or similar when the process 
> hangs. Use the dump file then to get a stack trace.
> 
> Torsten Förtsch

Thank you very much for the reply! Here is the result:

http://pastebin.com/YDbmq84w

This shows me:
* running the Apache benchmarking utility to generate lots of requests
* identifying a process hung in 'futex_wait' (11447)
* killing it with SEGV
* obtaining a stack trace

Max.

Re: Apache Children Stuck on futex

Posted by SalusaSecondus <sa...@nationstates.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
 
(Sorry about the grave-dig, but as this is still an issue.)

I'm still coming up to speed on the inner working of mod_perl (I've
never played in it before), but Max asked me to take a look at the futex
problem, so I thought I'd try to pick up where it was left off and
hopefully get this fixed.

My system:
Linux modperl 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
2011 x86_64 x86_64 x86_64 GNU/Linux
Apache/2.2.20 (build from Ubuntu source packages with debug symbols)
mod_perl 2.0.5 (build from Ubuntu source packages with debug symbols)

Torsten Förtsch wrote:

> Can you install the symbol tables for your modperl and perhaps check the
> values of *tipool in the core? I think it is
>
> tipool->size == tipool->in_use == tipool->cfg->max

3, 0, and 5 respectively for all threads blocked on
modperl_tipool_wait(tipool).

> BTW, there are IMHO many points about the tipool implementation that
can be improved.
> Why do we use these lists? Wouldn't it be better to allocated an array
of tipool->cfg->max
> pointers? Or perhaps an apr_hash_t in pconf?

As I don't understand the inner workings yet, I don't know, and hope
figure out.

Greg
P.S. Since it's been a while, here is the archived thread:
http://www.gossamer-threads.com/lists/modperl/modperl/103558#103558
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iEYEARECAAYFAk85M4QACgkQF1oFywYE3z7GvACfSZ2uU7Vfnn60rRlEJHBNLkVk
nL8AoO7bz5sEM/B7OSDdZhgxbvi1j7gT
=NCp+
-----END PGP SIGNATURE-----


Re: Apache Children Stuck on futex

Posted by Max Barry <ma...@maxbarry.com>.
I'm told there are people watching this issue, so the good news is my
colleague Greg Rubin seems to have tracked down the source of the
problem! There is a patch & description here:

http://www.gossamer-threads.com/lists/modperl/dev/104026

Max.

Re: Apache Children Stuck on futex

Posted by Max Barry <ma...@maxbarry.com>.
On 15/11/11 01:26, Torsten Förtsch wrote:
> On Monday, 14 November 2011 04:42:16 Max Barry wrote:
>> Here is the result:
>>
>> http://pastebin.com/YDbmq84w
>>
>> This shows me:
>> * running the Apache benchmarking utility to generate lots of requests
>> * identifying a process hung in 'futex_wait' (11447)
>> * killing it with SEGV
>> * obtaining a stack trace
> 
> Thanks Max. It really seems to be a modperl problem. I think there is 
> either something fishy with modperl_tipool_putback_base() or someone 
> writes to a location that it doesn't own.
> 
> Many of your threads block in modperl_tipool_pop() waiting for an 
> interpreter to become available:
> 
>         /* block until an item becomes available */
>         modperl_tipool_wait(tipool);
> 
> In src/modules/perl/modperl_tipool.c in function 
> modperl_tipool_putback_base() you find these lines:
> 
>     if (!listp) {
>         /* XXX: Attempt to putback something that was never there */
>         modperl_tipool_unlock(tipool);
>         return;
>     }
> 
> I think the code should not return here but call abort() and dump core 
> because if it enters the if-block it tries to push back an interpreter 
> that was not taken from the pool. But why would someone call 
> modperl_tipool_putback_base if not to release an interpreter. Hence the 
> interpreter is lost. The other part of the function seems quite 
> reasonable. So, I think modperl_tipool_putback_base() is sometimes called 
> with a wrong data pointer and thus leaks interpreters.
> 
> Can you install the symbol tables for your modperl and perhaps check the 
> values of *tipool in the core? I think it is
> 
>   tipool->size == tipool->in_use == tipool->cfg->max
> 
> That would explain the behavior.
> 
> BTW, there are IMHO many points about the tipool implementation that can 
> be improved. Why do we use these lists? Wouldn't it be better to 
> allocated an array of tipool->cfg->max pointers? Or perhaps an apr_hash_t 
> in pconf?
> 
> Torsten Förtsch

Hi Torsten,

I'm afraid that installing debugging symbols is beyond me, but I have
confirmed that the problem is reproducible in a clean Ubuntu Server install.

Here is me going from a brand new Ubuntu Server install to futex_wait
hang in a few easy steps:

http://pastebin.com/ahDtAeAS

To reproduce:

1. Download an ISO of Ubuntu Server 11.10 64-bit. (I got it from a local
mirror:
http://mirror.aarnet.edu.au/pub/ubuntu/releases/11.10/ubuntu-11.10-server-amd64.iso).

2. Install as a virtual machine. (I installed inside VirtualBox
4.1.4-r74291, accepting all defaults and installing no additional packages.)

3. Install mod_perl2, configure the 'default' site to use it, and lower
MaxRequestsPerChild.

4. Smash the server with requests.

I hope this is sufficient to let you find the problem. Please let me
know if I can help further.

Max.



Re: Apache Children Stuck on futex

Posted by Torsten Förtsch <to...@gmx.net>.
On Monday, 14 November 2011 04:42:16 Max Barry wrote:
> Here is the result:
> 
> http://pastebin.com/YDbmq84w
> 
> This shows me:
> * running the Apache benchmarking utility to generate lots of requests
> * identifying a process hung in 'futex_wait' (11447)
> * killing it with SEGV
> * obtaining a stack trace

Thanks Max. It really seems to be a modperl problem. I think there is 
either something fishy with modperl_tipool_putback_base() or someone 
writes to a location that it doesn't own.

Many of your threads block in modperl_tipool_pop() waiting for an 
interpreter to become available:

        /* block until an item becomes available */
        modperl_tipool_wait(tipool);

In src/modules/perl/modperl_tipool.c in function 
modperl_tipool_putback_base() you find these lines:

    if (!listp) {
        /* XXX: Attempt to putback something that was never there */
        modperl_tipool_unlock(tipool);
        return;
    }

I think the code should not return here but call abort() and dump core 
because if it enters the if-block it tries to push back an interpreter 
that was not taken from the pool. But why would someone call 
modperl_tipool_putback_base if not to release an interpreter. Hence the 
interpreter is lost. The other part of the function seems quite 
reasonable. So, I think modperl_tipool_putback_base() is sometimes called 
with a wrong data pointer and thus leaks interpreters.

Can you install the symbol tables for your modperl and perhaps check the 
values of *tipool in the core? I think it is

  tipool->size == tipool->in_use == tipool->cfg->max

That would explain the behavior.

BTW, there are IMHO many points about the tipool implementation that can 
be improved. Why do we use these lists? Wouldn't it be better to 
allocated an array of tipool->cfg->max pointers? Or perhaps an apr_hash_t 
in pconf?

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net