You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2007/03/02 17:32:19 UTC
DO NOT REPLY [Bug 41748] New: - Segmentation violation in httpd - thread/worker
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
Summary: Segmentation violation in httpd - thread/worker
Product: Apache httpd-2
Version: 2.0.59
Platform: Other
OS/Version: AIX
Status: NEW
Severity: critical
Priority: P2
Component: worker
AssignedTo: bugs@httpd.apache.org
ReportedBy: thulek@cz.ibm.com
We suffer from SIGSEGV abnormal httpd terminations. It seems to be related to
the worker/thread components.
The crashes only happen under higher load, on AIX 5.3.
More detailed info:
--------------------------------------------------------------------------------------------------------------
We run the same binary on more than 200 servers with AIX 5.2 without problems.
We have reproduced the problem with '-g' compiled binary:
ts9 /work/thulek/httpd-2.0.59/server# dbx /usr/adissys/httpd/bin/httpd
/usr/adissys/httpd/core
Type 'help' for help.
[using memory image in /usr/adissys/httpd/core]
reading symbolic information ...
Segmentation fault in pth_usched._event_sleep [/usr/lib/libpthread.a] at
0xd0122bc4 ($t3)
0xd0122bc4 (_event_sleep+0xfc) 80410014 lwz r2,0x14(r1)
(dbx) t
pth_usched._event_sleep(??, ??, ??, ??, ??, ??) at 0xd0122bc4
pth_usched._event_wait(??, ??) at 0xd012314c
pth_cond._cond_wait_local(??, ??, ??) at 0xd012e820
pth_cond._cond_wait(??, ??, ??) at 0xd012ee58
pth_cond.pthread_cond_wait(??, ??) at 0xd012fa28
apr_thread_cond_wait(cond = 0x1003f65c, mutex = (nil)), line 80 in "thread_cond.c"
ap_queue_pop(queue = (nil), sd = (nil), p = (nil)), line 258 in "fdqueue.c"
worker_thread(thd = (nil), dummy = (nil)), line 809 in "worker.c"
dummy_worker(opaque = (nil)), line 105 in "thread.c"
--------------------------------------------------------------------------------------------------------------
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
------- Additional Comments From thulek@cz.ibm.com 2007-03-05 04:59 -------
The OS level is:
# oslevel -s
5300-05-02
plus latest levels of bos.64bit, bos.mp, bos.mp64, bos.net.tcp.client,
bos.perf.tools, bos.rte.libc, bos.sysmgt.serv_aid.
The thread that handled the SIGSEGV signal is 4:
(dbx) thread current 4
(dbx) thread
thread state-k wchan state-u k-tid mode held scope function
$t1 run running 1016051 u no sys read
$t2 run terminated 1020147 k no sys _event_sleep
*$t3 run blocked 1024245 k no sys _event_sleep
>$t4 run running 1028343 u no sys sig_coredump
$t5 run running 1032441 u no sys read
$t6 run running 1036539 u no sys read
$t7 run running 1040637 u no sys read
$t8 run blocked 1044735 u no sys _event_sleep
$t9 run blocked 1048577 u no sys _event_sleep
$t10 run blocked 1052675 u no sys _event_sleep
$t11 run blocked 1056773 u no sys _event_sleep
$t12 run blocked 1060871 u no sys _event_sleep
$t13 run blocked 1064969 u no sys _event_sleep
$t14 run blocked 1069067 u no sys _event_sleep
$t15 run blocked 1073165 u no sys _event_sleep
$t16 run blocked 1077263 u no sys _event_sleep
$t17 run blocked 1081361 u no sys _event_sleep
$t18 run blocked 1085459 u no sys _event_sleep
$t19 run blocked 1089557 u no sys _event_sleep
$t20 run blocked 1093655 u no sys _event_sleep
$t21 run blocked 1097753 u no sys _event_sleep
$t22 run blocked 1101851 u no sys _event_sleep
$t23 run blocked 1105949 u no sys _event_sleep
$t24 run blocked 1110047 u no sys _event_sleep
$t25 run blocked 1114145 u no sys _event_sleep
$t26 run running 1118243 u no sys read
$t27 run running 1122341 u no sys poll
$t28 run running 1126439 u no sys poll
(dbx) t
sig_coredump(sig = 0), line 1050 in "mpm_common.c"
malloc_y.malloc_y(0x2000, 0x0, 0x8, 0x20, 0x0, 0x100fb0f7, 0x100fb0f7, 0x8000)
at 0xd03272c4
malloc_common.malloc_common_53_36(??) at 0xd03248b8
jk_pool_dyn_alloc() at 0xd106d65c
jk_pool_alloc() at 0xd106d77c
jk_b_set_buffer_size() at 0xd107c768
service() at 0xd1082424
jk_handler() at 0xd1065b64
ap_run_handler(r = 0x100380d8), line 152 in "config.c"
ap_invoke_handler(r = 0x1003820c), line 364 in "config.c"
ap_process_request(r = 0x30767818), line 249 in "http_request.c"
ap_process_http_connection(c = 0x100173cc), line 253 in "http_core.c"
ap_run_process_connection(c = 0xf03c73ac), line 43 in "connection.c"
ap_process_connection(c = 0x307697f0, csd = 0x307677e0), line 176 in "connection.c"
process_socket(p = (nil), sock = (nil), my_child_num = 0, my_thread_num = 0,
bucket_alloc = (nil)), line 522 in "worker.c"
worker_thread(thd = (nil), dummy = (nil)), line 842 in "worker.c"
dummy_worker(opaque = (nil)), line 105 in "thread.c"
(dbx)
How would I find the thread that caused the dump?
Tomas
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
------- Additional Comments From thulek@cz.ibm.com 2007-03-16 07:44 -------
While debuging we discovered the terminations (SIGSEGV) seemed to be related to
various symptoms of heap corruption (eg. SIGSEGV in malloc(), or invalid pointer
passed to free()).
Two different approaches seemed to fix the problem:
1) Using the original build, created on AIX 5.1
Our original httpd build was made on AIX 5.1 and run under AIX 5.3. The default
envvars configuration contains: "MALLOCMULTIHEAP=considersize,heaps:8; export
MALLOCMULTIHEAP;"
After unsetting the environment variable MALLOCMULTIHEAP no abnormal httpd
terminations occurred.
2) New build of httpd on AIX 5.3
We made a another Apache build on AIX 5.3. Running this compilation with default
envvars configuration was without problems too, ie. multiheap
We are still puzzled as to why the behaviour of malloc (with respect to the
multiheap feature) depended on which version of AIX was used for compilation...
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
------- Additional Comments From trawick@apache.org 2007-03-05 07:10 -------
>How would I find the thread that caused the dump?
That's a bit tricky. The backtrace you showed was that of the thread that
*triggered* the crash. However, see the discussion below:
>sig_coredump(sig = 0), line 1050 in "mpm_common.c"
>malloc_y.malloc_y(0x2000, 0x0, 0x8, 0x20, 0x0, 0x100fb0f7, 0x100fb0f7, 0x8000)
at 0xd03272c4
>malloc_common.malloc_common_53_36(??) at 0xd03248b8
>jk_pool_dyn_alloc() at 0xd106d65c
You have a memory corruption problem, since the AIX heap library is crashing.
There's no reason to believe that the thread which segfaulted (under control of
mod_jk) is the thread which caused the problem.
Possibly the AIX heap debugging will help by causing the crash to occur much
closer to the point of the problem.
Very basic usage is to set
export MALLOCTYPE=debug
in bin/envvars (I assume you use apachectl to start up; apachectl will read that
file)
If you get a different traceback when MALLOCTYPE is enabled and the traceback
doesn't show the AIX heap library, post the traceback to this PR.
Here's more information about malloc debugging:
http://publibn.boulder.ibm.com/doc_link/Ja_JP/a_doc_lib/aixprggd/genprogc/debug_malloc.htm
To the extent that it is practical, yank any non-httpd-distributed modules from
the configuration and try to reproduce. Those which are used less by the
overall population of Apache httpd users are more likely suspects.
The point that you're using the same binary on 5.2 is an obvious point to
consider. It has occasionally been observed that exact same code and
configuration bombs on a newer OS level because of OS changes which expose the
particular problem, or to allow the problem to be exposed much more frequently.
I recall that AIX 5.2 and 5.3 use different heap libraries by default.
Exposure of certain types of heap corruption defects is dependent on the heap
library implementation. A web search for ->yorktown watson 5.3<- will yield
some discussion of the heap library implementations and perhaps a way to switch
back and forth for investigation.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
trawick@apache.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |NEEDINFO
------- Additional Comments From trawick@apache.org 2007-03-02 10:10 -------
The dbx display of parameters is incorrect. The display of which thread crashed
is presumably incorrect too, since that is a system function.
The backtrace happens to be the most common one during normal operation since it
is that of an idle thread. See what the other threads are doing by using the
thread command to list threads then "thread current THREADNUM" followed by where
to see the backtrace.
Also, don't try to use AIX 5.3 GA with no maintenance. I use TL-5 currently.
I'm pretty sure that ML 1 is fine. Make sure IY58143 is included in your
current or future maintenance level. (I don't recall whether or not that is in
ML 1.)
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org
DO NOT REPLY [Bug 41748] - Segmentation violation in httpd - thread/worker
Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41748>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=41748
rainer.jung@kippdata.de changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rainer.jung@kippdata.de
------- Additional Comments From rainer.jung@kippdata.de 2008-01-01 18:32 -------
Another user reported crashes on AIX for mod_jk.
Our way of detecting a multi-threaded context during JK build was not working
for AIX. We switched to doing a thread-safe build by default in version 1.2.24+
(and there's a configure switch for those who really want to do a non
thread-safe build).
It's not unlikely, that your problem does no longer exist, once you are using a
recent mod_jk.
It would be nice, if you could do a littl test and let us know the result.
Thanks.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org