You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2007/03/02 17:32:19 UTC

DO NOT REPLY [Bug 41748] New: - Segmentation violation in httpd - thread/worker

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748

           Summary: Segmentation violation in httpd - thread/worker
           Product: Apache httpd-2
           Version: 2.0.59
          Platform: Other
        OS/Version: AIX
            Status: NEW
          Severity: critical
          Priority: P2
         Component: worker
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: thulek@cz.ibm.com


We suffer from SIGSEGV abnormal httpd terminations. It seems to be related to
the worker/thread components.

The crashes only happen under higher load, on AIX 5.3.

More detailed info:
--------------------------------------------------------------------------------------------------------------
We run the same binary on more than 200 servers with AIX 5.2 without problems.
We have reproduced the problem with '-g' compiled binary:

ts9 /work/thulek/httpd-2.0.59/server# dbx /usr/adissys/httpd/bin/httpd
/usr/adissys/httpd/core
Type 'help' for help.
[using memory image in /usr/adissys/httpd/core]
reading symbolic information ...
 
Segmentation fault in pth_usched._event_sleep [/usr/lib/libpthread.a] at
0xd0122bc4 ($t3)
0xd0122bc4 (_event_sleep+0xfc) 80410014        lwz   r2,0x14(r1)
(dbx) t
pth_usched._event_sleep(??, ??, ??, ??, ??, ??) at 0xd0122bc4
pth_usched._event_wait(??, ??) at 0xd012314c
pth_cond._cond_wait_local(??, ??, ??) at 0xd012e820
pth_cond._cond_wait(??, ??, ??) at 0xd012ee58
pth_cond.pthread_cond_wait(??, ??) at 0xd012fa28
apr_thread_cond_wait(cond = 0x1003f65c, mutex = (nil)), line 80 in "thread_cond.c"
ap_queue_pop(queue = (nil), sd = (nil), p = (nil)), line 258 in "fdqueue.c"
worker_thread(thd = (nil), dummy = (nil)), line 809 in "worker.c"
dummy_worker(opaque = (nil)), line 105 in "thread.c"
--------------------------------------------------------------------------------------------------------------

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748





------- Additional Comments From thulek@cz.ibm.com  2007-03-05 04:59 -------
The OS level is:
  # oslevel -s
  5300-05-02
plus latest levels of bos.64bit, bos.mp, bos.mp64, bos.net.tcp.client,
bos.perf.tools, bos.rte.libc, bos.sysmgt.serv_aid.

The thread that handled the SIGSEGV signal is 4:
(dbx) thread current 4
(dbx) thread
 thread  state-k     wchan    state-u    k-tid   mode held scope function
 $t1     run                  running  1016051     u   no   sys  read
 $t2     run                  terminated 1020147     k   no   sys  _event_sleep
*$t3     run                  blocked  1024245     k   no   sys  _event_sleep
>$t4     run                  running  1028343     u   no   sys  sig_coredump
 $t5     run                  running  1032441     u   no   sys  read
 $t6     run                  running  1036539     u   no   sys  read
 $t7     run                  running  1040637     u   no   sys  read
 $t8     run                  blocked  1044735     u   no   sys  _event_sleep
 $t9     run                  blocked  1048577     u   no   sys  _event_sleep
 $t10    run                  blocked  1052675     u   no   sys  _event_sleep
 $t11    run                  blocked  1056773     u   no   sys  _event_sleep
 $t12    run                  blocked  1060871     u   no   sys  _event_sleep
 $t13    run                  blocked  1064969     u   no   sys  _event_sleep
 $t14    run                  blocked  1069067     u   no   sys  _event_sleep
 $t15    run                  blocked  1073165     u   no   sys  _event_sleep
 $t16    run                  blocked  1077263     u   no   sys  _event_sleep
 $t17    run                  blocked  1081361     u   no   sys  _event_sleep
 $t18    run                  blocked  1085459     u   no   sys  _event_sleep
 $t19    run                  blocked  1089557     u   no   sys  _event_sleep
 $t20    run                  blocked  1093655     u   no   sys  _event_sleep
 $t21    run                  blocked  1097753     u   no   sys  _event_sleep
 $t22    run                  blocked  1101851     u   no   sys  _event_sleep
 $t23    run                  blocked  1105949     u   no   sys  _event_sleep
 $t24    run                  blocked  1110047     u   no   sys  _event_sleep
 $t25    run                  blocked  1114145     u   no   sys  _event_sleep
 $t26    run                  running  1118243     u   no   sys  read
 $t27    run                  running  1122341     u   no   sys  poll
 $t28    run                  running  1126439     u   no   sys  poll
(dbx) t
sig_coredump(sig = 0), line 1050 in "mpm_common.c"
malloc_y.malloc_y(0x2000, 0x0, 0x8, 0x20, 0x0, 0x100fb0f7, 0x100fb0f7, 0x8000)
at 0xd03272c4
malloc_common.malloc_common_53_36(??) at 0xd03248b8
jk_pool_dyn_alloc() at 0xd106d65c
jk_pool_alloc() at 0xd106d77c
jk_b_set_buffer_size() at 0xd107c768
service() at 0xd1082424
jk_handler() at 0xd1065b64
ap_run_handler(r = 0x100380d8), line 152 in "config.c"
ap_invoke_handler(r = 0x1003820c), line 364 in "config.c"
ap_process_request(r = 0x30767818), line 249 in "http_request.c"
ap_process_http_connection(c = 0x100173cc), line 253 in "http_core.c"
ap_run_process_connection(c = 0xf03c73ac), line 43 in "connection.c"
ap_process_connection(c = 0x307697f0, csd = 0x307677e0), line 176 in "connection.c"
process_socket(p = (nil), sock = (nil), my_child_num = 0, my_thread_num = 0,
bucket_alloc = (nil)), line 522 in "worker.c"
worker_thread(thd = (nil), dummy = (nil)), line 842 in "worker.c"
dummy_worker(opaque = (nil)), line 105 in "thread.c"
(dbx)

How would I find the thread that caused the dump?

Tomas

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748





------- Additional Comments From thulek@cz.ibm.com  2007-03-16 07:44 -------
While debuging we discovered the terminations (SIGSEGV) seemed to be related to
various symptoms of heap corruption (eg. SIGSEGV in malloc(), or invalid pointer
passed to free()).

Two different approaches seemed to fix the problem:

1) Using the original build, created on AIX 5.1
Our original httpd build was made on AIX 5.1 and run under AIX 5.3. The default
envvars configuration contains: "MALLOCMULTIHEAP=considersize,heaps:8; export
MALLOCMULTIHEAP;"
After unsetting the environment variable MALLOCMULTIHEAP no abnormal httpd
terminations occurred. 

2) New build of httpd on AIX 5.3
We made a another Apache build on AIX 5.3. Running this compilation with default
envvars configuration was without problems too, ie. multiheap

We are still puzzled as to why the behaviour of malloc (with respect to the
multiheap feature) depended on which version of AIX was used for compilation...


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748





------- Additional Comments From trawick@apache.org  2007-03-05 07:10 -------
>How would I find the thread that caused the dump?

That's a bit tricky.  The backtrace you showed was that of the thread that
*triggered* the crash.  However, see the discussion below:

>sig_coredump(sig = 0), line 1050 in "mpm_common.c"
>malloc_y.malloc_y(0x2000, 0x0, 0x8, 0x20, 0x0, 0x100fb0f7, 0x100fb0f7, 0x8000)
at 0xd03272c4
>malloc_common.malloc_common_53_36(??) at 0xd03248b8
>jk_pool_dyn_alloc() at 0xd106d65c

You have a memory corruption problem, since the AIX heap library is crashing. 
There's no reason to believe that the thread which segfaulted (under control of
mod_jk) is the thread which caused the problem.

Possibly the AIX heap debugging will help by causing the crash to occur much
closer to the point of the problem.

Very basic usage is to set

export MALLOCTYPE=debug

in bin/envvars (I assume you use apachectl to start up; apachectl will read that
file)

If you get a different traceback when MALLOCTYPE is enabled and the traceback
doesn't show the AIX heap library, post the traceback to this PR.

Here's more information about malloc debugging:

http://publibn.boulder.ibm.com/doc_link/Ja_JP/a_doc_lib/aixprggd/genprogc/debug_malloc.htm

To the extent that it is practical, yank any non-httpd-distributed modules from
the configuration and try to reproduce.  Those which are used less by the
overall population of Apache httpd users are more likely suspects.

The point that you're using the same binary on 5.2 is an obvious point to
consider.  It has occasionally been observed that exact same code and
configuration bombs on a newer OS level because of OS changes which expose the
particular problem, or to allow the problem to be exposed much more frequently.
 I recall that AIX 5.2 and 5.3 use different heap libraries by default. 
Exposure of certain types of heap corruption defects is dependent on the heap
library implementation.  A web search for ->yorktown watson 5.3<- will yield
some discussion of the heap library implementations and perhaps a way to switch
back and forth for investigation.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748


trawick@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




------- Additional Comments From trawick@apache.org  2007-03-02 10:10 -------
The dbx display of parameters is incorrect.  The display of which thread crashed
is presumably incorrect too, since that is a system function.

The backtrace happens to be the most common one during normal operation since it
is that of an idle thread.  See what the other threads are doing by using the
thread command to list threads then "thread current THREADNUM" followed by where
to see the backtrace.

Also, don't try to use AIX 5.3 GA with no maintenance.  I use TL-5 currently.  
I'm pretty sure that ML 1 is fine.  Make sure IY58143 is included in your
current or future maintenance level.  (I don't recall whether or not that is in
ML 1.)

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=41748


rainer.jung@kippdata.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rainer.jung@kippdata.de




------- Additional Comments From rainer.jung@kippdata.de  2008-01-01 18:32 -------
Another user reported crashes on AIX for mod_jk.

Our way of detecting a multi-threaded context during JK build was not working
for AIX. We switched to doing a thread-safe build by default in version 1.2.24+
(and there's a configure switch for those who really want to do a non
thread-safe build).

It's not unlikely, that your problem does no longer exist, once you are using a
recent mod_jk.

It would be nice, if you could do a littl test and let us know the result.

Thanks.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org