You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2011/09/01 13:52:31 UTC

po

Hi,

I've been load-testing our module
(mod_pagespeed<http://code.google.com/speed/page-speed/docs/module.html>)
with httpd 2.2.16 built with these options:
     --enable-pool-debug --with-mpm=worker
I've been getting periodic aborts from apr_table_addn that don't look like
they are from my module.  These do not occur when using 'prefork'.

Here's a stack-trace recovered from a core file:

Program terminated with signal 6, Aborted.
#0  0x00007fdd3bbd9a75 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
 in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) where
#0  0x00007fdd3bbd9a75 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fdd3bbdd5c0 in abort () at abort.c:92
#2  0x00007fdd3c9a2e57 in apr_table_addn (t=0xe84980, key=0xd51d60
"Accept-Encoding", val=0xd51d71 "identity") at tables/apr_tables.c:813
#3  0x0000000000433e36 in ap_get_mime_headers_core (r=0xf27de0, bb=0xdda3a0)
at protocol.c:799
#4  0x000000000043456b in ap_read_request (conn=0xe51620) at protocol.c:918
#5  0x000000000047f772 in ap_process_http_connection (c=0xe51620) at
http_core.c:183
#6  0x0000000000446e28 in ap_run_process_connection (c=0xe51620) at
connection.c:43
#7  0x00000000004b3297 in process_socket (thd=<value optimized out>,
dummy=<value optimized out>) at worker.c:544
#8  worker_thread (thd=<value optimized out>, dummy=<value optimized out>)
at worker.c:894
#9  0x00007fdd3c1339ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300
#10 0x00007fdd3bc8c70d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:112


Questions

1. Is this a bug in httpd?
2. Or could I somehow have caused this with a programming error in my
module?
3. Or is enable-pool-debug simply incompatible with the Worker MPM?

Thanks,
-Josh

Re: po

Posted by Joshua Marantz <jm...@google.com>.

On Thu, Sep 1, 2011 at 11:16 AM, Ray Morris <su...@bettercgi.com> wrote:

> > this code has to run crazy fast and has lots of mileage on it.
> ...
> > OK given the stack-trace above it's hard for me to figure out
> > a path back from my module.
>
> Why not run the test without your new module loaded?
>

This is such an excellent idea.  The module is, unfortunately in this case,
generating the load for the load-test, as well as handling it.  I think I
can have the module do *less* but if the module is not loaded at all, then
there will be no load.

> That seems like a far simpler and more reliable indication
> of whether the issue is in the code with "mileage on it" or
> in the new code.
>

The code with "mileage" in protocol.c is not really under suspicion at this
point -- I probably shouldn't have mentioned it, except that it appears in
the stack at the time of the abort(), and mod_pagespeed doesn't.

The code that I suspect has a lot less mileage on it is the application
of pool debugging to the Worker MPM.  That's why I want to know if anyone
else has put those two together under heavy load.

-Josh

Re: po

Posted by Ray Morris <su...@bettercgi.com>.

> this code has to run crazy fast and has lots of mileage on it. 
...
> OK given the stack-trace above it's hard for me to figure out 
> a path back from my module.  

Why not run the test without your new module loaded?
That sems like a far simpler and more reliable indication 
of whether the issue is in the code with "mileage on it" or
in the new code.


> Hmmm...don't know what happened to that subject line "po".  
> Not what I meant to type, obviously!

Another O would have been quite embarassing wouldn't it.
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php

Re: po

Posted by Joshua Marantz <jm...@google.com>.

Hi Ben,

Hmmm...don't know what happened to that subject line "po".  Not what I meant
to type, obviously!

On Thu, Sep 1, 2011 at 8:14 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
>
> That assertion is triggered when you add a string from pool A to a
> table in pool B where A is a child of B (adding headers from the
> request_rec to a conn_rec table, for example). It's a lifecycle issue.
>

OK given the stack-trace above it's hard for me to figure out a path back
from my module.  Of course a random memory stomp could make any weird thing
happen, but only this one specific thing always happens.  And we try to
avoid memory-stomps via valgrind etc though it's impossible to prove that
none exist under load.

But I think I've found 2 potential problems in httpd:
  1. protocol.c calls apr_table_addn using a char* value that is offset from
a pool-allocated string.  It's not obvious that it's doing
      this safely but it makes sense this code has to run crazy fast and has
lots of mileage on it.  And it's probably OK for the pool-find
      algorithm appears to do a pointer-range check in apr_pools.c:1908,
function pool_find().  For the moment I'll assume this is not
      the actual problem.
  2. there are a bunch of ifdefs in apr_pools.c for POOL_DEBUG and there's a
some mutexing for APR_HAS_THREADS, e.g. in
     apr_allocator_max_free_set.  But I suspect the debug structures for
POOL_DEBUG are not adequately mutexed.

Has anyone specifically used pool-debug and worker-mpm together under heavy
load?

Thanks!
-Josh

Re: po

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.

On Thu, Sep 1, 2011 at 13:52, Joshua Marantz <jm...@google.com> wrote:
> Hi,
>
> I've been load-testing our module
> (mod_pagespeed<http://code.google.com/speed/page-speed/docs/module.html>)
> with httpd 2.2.16 built with these options:
>     --enable-pool-debug --with-mpm=worker
> I've been getting periodic aborts from apr_table_addn that don't look like
> they are from my module.  These do not occur when using 'prefork'.
>
> Here's a stack-trace recovered from a core file:
>
> Program terminated with signal 6, Aborted.
> #0  0x00007fdd3bbd9a75 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>  in ../nptl/sysdeps/unix/sysv/linux/raise.c
> (gdb) where
> #0  0x00007fdd3bbd9a75 in raise (sig=<value optimized out>) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1  0x00007fdd3bbdd5c0 in abort () at abort.c:92
> #2  0x00007fdd3c9a2e57 in apr_table_addn (t=0xe84980, key=0xd51d60
> "Accept-Encoding", val=0xd51d71 "identity") at tables/apr_tables.c:813
> #3  0x0000000000433e36 in ap_get_mime_headers_core (r=0xf27de0, bb=0xdda3a0)
> at protocol.c:799
> #4  0x000000000043456b in ap_read_request (conn=0xe51620) at protocol.c:918
> #5  0x000000000047f772 in ap_process_http_connection (c=0xe51620) at
> http_core.c:183
> #6  0x0000000000446e28 in ap_run_process_connection (c=0xe51620) at
> connection.c:43
> #7  0x00000000004b3297 in process_socket (thd=<value optimized out>,
> dummy=<value optimized out>) at worker.c:544
> #8  worker_thread (thd=<value optimized out>, dummy=<value optimized out>)
> at worker.c:894
> #9  0x00007fdd3c1339ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #10 0x00007fdd3bc8c70d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
>
>
> Questions
>
> 1. Is this a bug in httpd?

Probably not.

> 2. Or could I somehow have caused this with a programming error in my
> module?

That seems more likely.

> 3. Or is enable-pool-debug simply incompatible with the Worker MPM?

Not that I know.

That assertion is triggered when you add a string from pool A to a
table in pool B where A is a child of B (adding headers from the
request_rec to a conn_rec table, for example). It's a lifecycle issue.