You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by ma...@bellsouth.net on 2003/11/27 18:13:17 UTC

Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Howdy,

We are running Apache 1.3.28 (with auth_ldap & weblogic plugin) on Redhat
Linux 7.3 (latest kernel and glibc RPMs are installed), and for
some reason, Apache keeps SEGFAULT'ing every 4 - 5 days. When apache
SEGFAULTS, it doesn't produce a core file, and spits out the following
items in our error_log:

[Thu Nov 20 11:33:56 2003] [notice] child pid 31893 exit signal
Segmentation fault (11)
[Thu Nov 20 11:33:58 2003] [notice] child pid 31878 exit signal
Segmentation fault (11)
[Thu Nov 20 11:34:10 2003] [notice] child pid 31924 exit signal
Segmentation fault (11)
[Thu Nov 20 11:34:12 2003] [notice] child pid 31891 exit signal
Segmentation fault (11)
[Thu Nov 20 11:34:13 2003] [notice] child pid 31825 exit signal
Segmentation fault (11)

I have scoured the apache archives and bugzilla, and have tried various
items to get a core file. I verified that my ulimits aren't stopping cores
from being generated, added "CoreDumpDirectory /free/core" to my
httpd.conf (/free/core is writeable by the parent and children), and
installed the prtctl module from:

http://www.apache.org/~trawick/mod_prctl.c

I am still unable to get a core when Apache SEGFAULTS (or when I send
SIGSEGV/SIGABRT signals). Anyone have any recommendations on how I can get
apache to dump core, or determine what is causing apache to die? I have
tried everything I can think of, and have checked every possible
information outlet I can find. I have attached a strace from one of the
child processes that SEGFAULTS. Is it possible to gather additional
information through gdb? I have tried to "attach and continue" a child,
but when it crashes, gdb compalins that the stack frames are no longer
available.

Thanks for any insight,
- Matty

Strace:
[pid 32119] send(12,
"\27\3\1\0Xo\"+\263\235\234\345\1\253\320\345\33\234\245"..., 93, 0) = 93
[pid 32119] poll( <unfinished ...>
[pid 32119] <... poll resumed> [{fd=12, events=POLLIN, revents=POLLIN},
{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 5, -1) = 1
[pid 32119] brk(0x810d000)              = 0x810d000
[pid 32119] recv(12, "\27\3\1\0\36", 5, 0) = 5
[pid 32119] recv(12,
"gSqv\263\221&\221\314\237\34\2251\312\316\304\265i\241"..., 30, 0) = 30
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] send(12,
"\27\3\1\0\205\5B\2\310\325\263\336\'\333\305\377\220\34"..., 138, 0) =
138
[pid 32119] poll( <unfinished ...>
[pid 32119] <... poll resumed> [{fd=12, events=POLLIN, revents=POLLIN},
{fd=-1}, {fd=-1}, {fd=-1}, {fd=-1}], 5, -1) = 1
[pid 32119] recv(12, "\27\3\1\1\34", 5, 0) = 5
[pid 32119] recv(12,
"\272\330\334\373\226{|\244\3069\332!\352\261i\357\344\21"..., 284, 0) =
284
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] send(12,
"\27\3\1\0d\376\303\277\305\211F\371\212.\332\216\37!!\213"..., 105, 0) =
105
[pid 32119] poll([{fd=12, events=POLLIN, revents=POLLIN}, {fd=-1},
{fd=-1}, {fd=-1}, {fd=-1}], 5, -1) = 1
[pid 32119] recv(12, "\27\3\1\0\36", 5, 0) = 5
[pid 32119] recv(12,
"\235g\3\222\342\370:\202\223m7K\363\205\367A\247\275[\225"..., 30, 0) =
30
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] fcntl64(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] fcntl64(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] fcntl64(4, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0,
len=0}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/tmp/wlproxy.log", O_WRONLY|O_APPEND|O_CREAT, 0666) = 13
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] _llseek(13, 0, [0], SEEK_SET) = 0
[pid 32119] close(13)                   = 0
[pid 32119] munmap(0x40014000, 4096)    = 0
[pid 32119] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 13
[pid 32119] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 32119] setsockopt(13, SOL_TCP, TCP_NODELAY, [-1], 4) = 0
[pid 32119] setsockopt(13, SOL_SOCKET, SO_REUSEADDR, [-1], 4) = 0
[pid 32119] connect(13, {sin_family=AF_INET, sin_port=htons(9004),
sin_addr=inet_addr("10.10.224.40")}}, 16) = -1 EINPROGRESS (Operation now
in progress)
[pid 32119] select(14, NULL, [13], [13], {2, 0}) = 1 (out [13], left {1,
990000})
[pid 32119] getsockopt(13, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 32119] fcntl64(13, F_GETFL)        = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR) = 0
[pid 32119] select(14, NULL, [13], [13], {300, 0}) = 1 (out [13], left
{300, 0})
[pid 32119] write(13, "GET /overrides/ovr_103.jsp?REC_N"..., 919) = 919
[pid 32119] select(14, [13], NULL, NULL, {300, 0} <unfinished ...>
[pid 32119] <... select resumed> )      = 1 (in [13], left {299, 940000})
[pid 32119] read(13, "HTTP/1.1 200 OK\r\nDate: Thu, 20 N"..., 4096) = 1749
[pid 32119] write(11, "HTTP/1.1 200 OK\r\nDate: Thu, 20 N"..., 1781) =
1781
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "e><th align=\'center\' nowrap><spa"..., 4096) = 1400
[pid 32119] write(11, "e><th align=\'center\' nowrap><spa"..., 1400) =
1400
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "0,geAW(name.name),\'\'); return fa"..., 4096) = 1288
[pid 32119] write(11, "0,geAW(name.name),\'\'); return fa"..., 1288) =
1288
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "class=\'none\' align=\'left\' valign"..., 4096) =
4096
[pid 32119] write(11, "class=\'none\' align=\'left\' valign"..., 4096) =
4096
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "alue=\'\'>\n<input type=\'hidden\' na"..., 4096) =
4096
[pid 32119] write(11, "alue=\'\'>\n<input type=\'hidden\' na"..., 4096) =
4096
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "age_VALUE_REQUIRED()); \n        "..., 4096) = 457
[pid 32119] write(11, "age_VALUE_REQUIRED()); \n        "..., 457) = 457
[pid 32119] close(13)                   = 0
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] write(17, "X.X.X.X- blahblah [20/Nov"..., 313) = 313
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11, "GET /system/bbutil.js HTTP/1.1\r\n"..., 4096) = 820
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] stat64("/etc/httpd/htdocs/email/system/bbutil.js",
{st_mode=S_IFREG|0644, st_size=65173, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/etc/httpd/htdocs/email/system/bbutil.js", O_RDONLY) =
13
[pid 32119] select(12, [11], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid 32119] write(11, "HTTP/1.1 304 Not Modified\r\nDate:"..., 197) = 197
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] write(17, "X.X.X.X- blahblah [20/Nov"..., 357) = 357
[pid 32119] close(13)                   = 0
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11,  <unfinished ...>
[pid 32119] <... read resumed> "GET /system/system.css HTTP/1.1\r"...,
4096) = 820
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] stat64("/etc/httpd/htdocs/email/system/system.css",
{st_mode=S_IFREG|0644, st_size=1199, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/etc/httpd/htdocs/email/system/system.css", O_RDONLY) =
13
[pid 32119] select(12, [11], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid 32119] write(11, "HTTP/1.1 304 Not Modified\r\nDate:"..., 196) = 196
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] write(17, "X.X.X.X- blahblah [20/Nov"..., 358) = 358
[pid 32119] close(13)                   = 0
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11,  <unfinished ...>
[pid 32119] <... read resumed> "GET /worktime.css HTTP/1.1\r\nAcc"...,
4096) = 780
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] stat64("/etc/httpd/htdocs/email/worktime.css",
{st_mode=S_IFREG|0644, st_size=12662, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/tmp/wlproxy.log", O_WRONLY|O_APPEND|O_CREAT, 0666) = 13
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] _llseek(13, 0, [0], SEEK_SET) = 0
[pid 32119] close(13)                   = 0
[pid 32119] munmap(0x40014000, 4096)    = 0
[pid 32119] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 13
[pid 32119] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 32119] setsockopt(13, SOL_TCP, TCP_NODELAY, [-1], 4) = 0
[pid 32119] setsockopt(13, SOL_SOCKET, SO_REUSEADDR, [-1], 4) = 0
[pid 32119] connect(13, {sin_family=AF_INET, sin_port=htons(9004),
sin_addr=inet_addr("10.10.224.40")}}, 16) = -1 EINPROGRESS (Operation now
in progress)
[pid 32119] select(14, NULL, [13], [13], {2, 0}) = 1 (out [13], left {2,
0})
[pid 32119] getsockopt(13, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 32119] fcntl64(13, F_GETFL)        = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR) = 0
[pid 32119] select(14, NULL, [13], [13], {300, 0}) = 1 (out [13], left
{300, 0})
[pid 32119] write(13, "GET /worktime.css HTTP/1.1\r\nAcc"..., 1026) = 1026
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{299, 990000})
[pid 32119] read(13, "HTTP/1.1 304 Not Modified\r\nDate:"..., 4096) = 157
[pid 32119] close(13)                   = 0
[pid 32119] select(12, [11], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid 32119] write(11, "HTTP/1.1 304 Not Modified\r\nDate:"..., 213) = 213
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] write(17, "X.X.X.X- blahblah [20/Nov"..., 354) = 354
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11,  <unfinished ...>
[pid 32119] <... read resumed> "GET /overrides/ovrCode.js HTTP/1"...,
4096) = 825
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] stat64("/etc/httpd/htdocs/email/overrides/ovrCode.js",
{st_mode=S_IFREG|0644, st_size=60773, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347769])          = 1069347769
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/etc/httpd/htdocs/email/overrides/ovrCode.js", O_RDONLY)
= 13
[pid 32119] select(12, [11], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid 32119] write(11, "HTTP/1.1 304 Not Modified\r\nDate:"..., 198) = 198
[pid 32119] time(NULL)                  = 1069347769
[pid 32119] write(17, "X.X.X.X- blahblah [20/Nov"..., 361) = 361
[pid 32119] close(13)                   = 0
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11,  <unfinished ...>
[pid 32119] <... read resumed> "GET /system/ui/DBLookupUIResolve"...,
4096) = 1405
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347772
[pid 32119]
stat64("/etc/httpd/htdocs/email/system/ui/DBLookupUIResolve.jsp",
{st_mode=S_IFREG|0644, st_size=9788, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] time([1069347772])          = 1069347772
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] open("/tmp/wlproxy.log", O_WRONLY|O_APPEND|O_CREAT, 0666) = 13
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000
[pid 32119] fstat64(13, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 32119] _llseek(13, 0, [0], SEEK_SET) = 0
[pid 32119] close(13)                   = 0
[pid 32119] munmap(0x40014000, 4096)    = 0
[pid 32119] socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 13
[pid 32119] fcntl64(13, F_GETFL)        = 0x2 (flags O_RDWR)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 32119] setsockopt(13, SOL_TCP, TCP_NODELAY, [-1], 4) = 0
[pid 32119] setsockopt(13, SOL_SOCKET, SO_REUSEADDR, [-1], 4) = 0
[pid 32119] connect(13, {sin_family=AF_INET, sin_port=htons(9004),
sin_addr=inet_addr("10.10.224.40")}}, 16) = -1 EINPROGRESS (Operation now
in progress)
[pid 32119] select(14, NULL, [13], [13], {2, 0}) = 1 (out [13], left {1,
990000})
[pid 32119] getsockopt(13, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 32119] fcntl64(13, F_GETFL)        = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 32119] fcntl64(13, F_SETFL, O_RDWR) = 0
[pid 32119] select(14, NULL, [13], [13], {300, 0}) = 1 (out [13], left
{300, 0})
[pid 32119] write(13, "GET /system/ui/DBLookupUIResolve"..., 1651) = 1651
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{299, 970000})
[pid 32119] read(13, "HTTP/1.1 200 OK\r\nDate: Thu, 20 N"..., 4096) = 348
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "            <html>\n<head>\n\t<titl"..., 4096) =
1400
[pid 32119] write(11, "HTTP/1.1 200 OK\r\nDate: Thu, 20 N"..., 1779) =
1779
[pid 32119] select(14, [13], NULL, NULL, {300, 0}) = 1 (in [13], left
{300, 0})
[pid 32119] read(13, "eturn;\r\n\t\t\t\t\t} else\r\n\t\t\t\t\t\ttarge"...,
4096) = 2008
[pid 32119] write(11, "eturn;\r\n\t\t\t\t\t}
else\r\n\t\t\t\t\t\ttarge"..., 2008) = 2008
[pid 32119] close(13)                   = 0
[pid 32119] time(NULL)                  = 1069347772
[pid 32119] write(17, "X.X.X.X - blahblah [20/Nov"..., 902) = 902
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11,  <unfinished ...>
[pid 32119] <... read resumed> "GET /messaging/businessObject.js"...,
4096) = 724
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347777
[pid 32119] stat64("/etc/httpd/htdocs/email/messaging/businessObject.jsp",
{st_mode=S_IFREG|0644, st_size=12754, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] write(11, "HTTP/1.1 401 Authorization Requi"..., 795) = 795
[pid 32119] time(NULL)                  = 1069347777
[pid 32119] write(17, "X.X.X.X- - [20/Nov/2003:1"..., 225) = 225
[pid 32119] rt_sigaction(SIGUSR1, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, {SIG_IGN}, 8) = 0
[pid 32119] read(11, "GET /messaging/businessObject.js"..., 4096) = 775
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347777
[pid 32119] stat64("/etc/httpd/htdocs/email/messaging/businessObject.jsp",
{st_mode=S_IFREG|0644, st_size=12754, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] --- SIGSEGV (Segmentation fault) ---
[pid 10610] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG,
NULL) = 32119

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by ma...@bellsouth.net.
On Mon, 1 Dec 2003, Jeff Trawick wrote:

> FWIW, it segfaults on a jsp request...  I suppose that this is handled by a
> third party module such as mod_jk?  See the final snippet:
>

We are using the BEA weblogic plugin to broker *.jsp to our application
server. We are using the latest QE'ed build of the plugin.

> [pid 32119] read(11, "GET /messaging/businessObject.js"..., 4096) = 775
> [pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
> SA_INTERRUPT|0x4000000}, 8) = 0
> [pid 32119] time(NULL)                  = 1069347777
> [pid 32119] stat64("/etc/httpd/htdocs/email/messaging/businessObject.jsp",
> {st_mode=S_IFREG|0644, st_size=12754, ...}) = 0
> [pid 32119] getpid()                    = 32119
> [pid 32119] getpid()                    = 32119
> [pid 32119] --- SIGSEGV (Segmentation fault) ---
> [pid 10610] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG,

We are also using auth_ldap to authenticate users, and authentication
looks to be occuring right before the programs SEGFAULTS. Since I cannot
associate the getpid()'s with a module, I cannot pinpoint which module is
causing Apache grief.

>
> Common issues with getting coredumps from Apache 1.3 on Linux 2.4 kernel:
>
> 1) prctl() call, resolved by mod_prctl or patch like that posted on the list

Even with the prtctl module Apache refuses to dump core. Once I start
Apache as the user "apache," it dumps core perfectly. It looks like
the Linux kernel refuses to let any program that changes it's uid
core. I wish this option could be disabled through "/proc" or sysctl
until I get a core to analyze.

> 2) ulimit -c setting in the shell used to start Apache

Our ulimits are cool.


> 3) CoredumpDirectory directive MUST BE SPECIFIED and must point to a directory
> that the web server user id (e.g., "nobody") has write access to; additionally,
> there must be plenty of free space in that directory; without the
> CoredumpDirectory, the default directory (serverroot) will be used, and the web
> server user id doesn't have write access there

Got all three of these items taken care of.

>
> Check #3 in particular.  I don't think it was mentioned in the responses to
> your post.
>
> Another thing: To verify that the problem with core dump is not your
> system/Apache configuration but instead something horrible that happens due to
> the way the child process crashes, send SIGSEGV to some random child process*
> and verify that the kernel is able to write a coredump and that the message
> written to the Apache error log has "possible coredump in /some/dir" as part of
> the "child pid XXX exit signal YYY" message
>
> *obviously you may not want to do that if the server is in production :)
>

Thanks for your feedback. I am going to reconfigure my web servers to
start Apache as a non-root user, and run the web servers on port 8000.
Since we have a pair of load-balancers in front of the web servers, this
shoulnd't cause use too much grief.

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Joe Orton <jo...@redhat.com>.
On Mon, Dec 01, 2003 at 07:52:03AM -0500, Jeff Trawick wrote:
> FWIW, it segfaults on a jsp request...  I suppose that this is handled by a 
> third party module such as mod_jk?  See the final snippet:
> 
> [pid 32119] read(11, "GET /messaging/businessObject.js"..., 4096) = 775
> [pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
> SA_INTERRUPT|0x4000000}, 8) = 0
> [pid 32119] time(NULL)                  = 1069347777
> [pid 32119] stat64("/etc/httpd/htdocs/email/messaging/businessObject.jsp",
> {st_mode=S_IFREG|0644, st_size=12754, ...}) = 0
> [pid 32119] getpid()                    = 32119
> [pid 32119] getpid()                    = 32119
> [pid 32119] --- SIGSEGV (Segmentation fault) ---
> [pid 10610] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG,
> 
> Common issues with getting coredumps from Apache 1.3 on Linux 2.4 kernel:
> 
> 1) prctl() call, resolved by mod_prctl or patch like that posted on the list

Unfortunately some of our kernels still have broken
prctl(PR_SET_DUMPABLE, 1) support after some over-aggressive ptrace
security fixes were added, the 7.3 kernel appears to be one.

joe

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by ma...@bellsouth.net.
On Mon, 1 Dec 2003, Jeff Trawick wrote:

> Jeff Trawick wrote:
>
> > FWIW, it segfaults on a jsp request...  I suppose that this is handled
> > by a third party module such as mod_jk?  See the final snippet:
>
> I was reminded by a little bird that you had mentioned in your original message
> that you were using the WebLogic plug-in.  Maybe some hints here will help you
> get a core dump, but regardless:  If you're using a commercially supported
> product and that product was likely doing something around the time of the
> crash (which seems likely), make sure you report it to them.  They can at least

I have had a case open with BEA since the issue started. They don't think (based
on a large strace I sent in) that the SEGFAULTS are occuring within their module.
Once I startup apache as the user "apache," I should be able to get a stack trace
to see what's making Apache angry. I also plan to strip all modules but the following:

LoadModule config_log_module  libexec/mod_log_config.so
LoadModule mime_module        libexec/mod_mime.so
LoadModule dir_module         libexec/mod_dir.so
LoadModule access_module      libexec/mod_access.so
LoadModule auth_module        libexec/mod_auth.so

ClearModuleList
AddModule mod_log_config.c
AddModule mod_mime.c
AddModule mod_dir.c
AddModule mod_access.c
AddModule mod_auth.c
AddModule mod_so.c

I will also need to add the weblogic and auth_ldap modules to the mix

> make sure you have their latest and greatest code, and if you're lucky even
> verify that there was some known issue that could explain the crashes.  Another

I have the latest source build.

> general debug technique that applies here: If you can verify that a crash only
> occurs for requests handled by that plug-in, that is more information they can
> use as well.

Thanks for the feedback.

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Jeff Trawick <tr...@attglobal.net>.
Jeff Trawick wrote:

> FWIW, it segfaults on a jsp request...  I suppose that this is handled 
> by a third party module such as mod_jk?  See the final snippet:

I was reminded by a little bird that you had mentioned in your original message 
that you were using the WebLogic plug-in.  Maybe some hints here will help you 
get a core dump, but regardless:  If you're using a commercially supported 
product and that product was likely doing something around the time of the 
crash (which seems likely), make sure you report it to them.  They can at least 
make sure you have their latest and greatest code, and if you're lucky even 
verify that there was some known issue that could explain the crashes.  Another 
general debug technique that applies here: If you can verify that a crash only 
occurs for requests handled by that plug-in, that is more information they can 
use as well.



Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Jeff Trawick <tr...@attglobal.net>.
FWIW, it segfaults on a jsp request...  I suppose that this is handled by a 
third party module such as mod_jk?  See the final snippet:

[pid 32119] read(11, "GET /messaging/businessObject.js"..., 4096) = 775
[pid 32119] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x4002127c, [],
SA_INTERRUPT|0x4000000}, 8) = 0
[pid 32119] time(NULL)                  = 1069347777
[pid 32119] stat64("/etc/httpd/htdocs/email/messaging/businessObject.jsp",
{st_mode=S_IFREG|0644, st_size=12754, ...}) = 0
[pid 32119] getpid()                    = 32119
[pid 32119] getpid()                    = 32119
[pid 32119] --- SIGSEGV (Segmentation fault) ---
[pid 10610] wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG,

Common issues with getting coredumps from Apache 1.3 on Linux 2.4 kernel:

1) prctl() call, resolved by mod_prctl or patch like that posted on the list
2) ulimit -c setting in the shell used to start Apache
3) CoredumpDirectory directive MUST BE SPECIFIED and must point to a directory 
that the web server user id (e.g., "nobody") has write access to; additionally, 
there must be plenty of free space in that directory; without the 
CoredumpDirectory, the default directory (serverroot) will be used, and the web 
server user id doesn't have write access there

Check #3 in particular.  I don't think it was mentioned in the responses to 
your post.

Another thing: To verify that the problem with core dump is not your 
system/Apache configuration but instead something horrible that happens due to 
the way the child process crashes, send SIGSEGV to some random child process* 
and verify that the kernel is able to write a coredump and that the message 
written to the Apache error log has "possible coredump in /some/dir" as part of 
the "child pid XXX exit signal YYY" message

*obviously you may not want to do that if the server is in production :)



Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Thursday, November 27, 2003 11:27 PM -0500 mattyml@bellsouth.net wrote:

> That is a good idea, but there has to be a way to debug things in their
> current state. Since this is the configuration that causes Apache to

No, there's not.  It's a problem with Linux not httpd.

> SEGFAULT, I would like to capture a core/stack trace when this
> configuration goes south. I am hoping one of the developers can shed some
> light on this.

AFAIK, Sander's suggestion is the only one you can use on 1.3: run on a higher 
port that doesn't invoke the setuid calls.

For 2.1, Jeff's added the (build-time optional) fatal_exception hook to allow 
modules to dump the stack trace to the error log before it dies.  But, that 
won't help you on 1.3.  -- justin

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by ma...@bellsouth.net.
On Thu, 27 Nov 2003, Sander Temme wrote:

> >>> some reason, Apache keeps SEGFAULT'ing every 4 - 5 days. When apache
> >>> SEGFAULTS, it doesn't produce a core file, and spits out the following
> >>> items in our error_log:
> >>
> >> Did you set your ulimit -S -c to a value other than 0? You need to do that
> >
> > I sure did. I read somewhere that the Linux 2.4 kernel won't allow
> > processes that change their user ids to dump core. I see why they did
> > that, but it is really putting me in a bind. Are there any other ways
> > to determine why Apache is dieing? ANy other thoughts on why the
> > prtctl plugin isn't allowing Apache to dump core?
>
> Yeah, that's strange because, looking at the code, this is exactly what the
> module is supposed to do. You could run your server as non-root, so it
> doesn't setuid and setgid on the children: this would take away the kernel's

That is a good idea, but there has to be a way to debug things in their
current state. Since this is the configuration that causes Apache to
SEGFAULT, I would like to capture a core/stack trace when this
configuration goes south. I am hoping one of the developers can shed some
light on this.

> issues with letting the children dump core. The server would have to be able
> to bind to its listening port(s) (use >1024) and write to its log directory
> and files.
>
> S.
>
> --
> Covalent Technologies                 sctemme@covalent.net
> Engineering group                    Voice: (415) 856 4214
> 303 Second Street #375 South           Fax: (415) 856 4210
> San Francisco CA 94107
>
> PGP FP: 7A8D B189 E871 80CB 9521  9320 C11E 7B47 964F 31D9
>
> =======================================================
> This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or
> distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and
> destroy all copies of the original message
> =======================================================
>

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Sander Temme <sc...@covalent.net>.
>>> some reason, Apache keeps SEGFAULT'ing every 4 - 5 days. When apache
>>> SEGFAULTS, it doesn't produce a core file, and spits out the following
>>> items in our error_log:
>> 
>> Did you set your ulimit -S -c to a value other than 0? You need to do that
> 
> I sure did. I read somewhere that the Linux 2.4 kernel won't allow
> processes that change their user ids to dump core. I see why they did
> that, but it is really putting me in a bind. Are there any other ways
> to determine why Apache is dieing? ANy other thoughts on why the
> prtctl plugin isn't allowing Apache to dump core?

Yeah, that's strange because, looking at the code, this is exactly what the
module is supposed to do. You could run your server as non-root, so it
doesn't setuid and setgid on the children: this would take away the kernel's
issues with letting the children dump core. The server would have to be able
to bind to its listening port(s) (use >1024) and write to its log directory
and files.

S.

-- 
Covalent Technologies                 sctemme@covalent.net
Engineering group                    Voice: (415) 856 4214
303 Second Street #375 South           Fax: (415) 856 4210
San Francisco CA 94107

PGP FP: 7A8D B189 E871 80CB 9521  9320 C11E 7B47 964F 31D9

=======================================================
This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and
destroy all copies of the original message
=======================================================


Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by ma...@bellsouth.net.
On Thu, 27 Nov 2003, Sander Temme wrote:

> > some reason, Apache keeps SEGFAULT'ing every 4 - 5 days. When apache
> > SEGFAULTS, it doesn't produce a core file, and spits out the following
> > items in our error_log:
>
> Did you set your ulimit -S -c to a value other than 0? You need to do that

I sure did. I read somewhere that the Linux 2.4 kernel won't allow
processes that change their user ids to dump core. I see why they did
that, but it is really putting me in a bind. Are there any other ways
to determine why Apache is dieing? ANy other thoughts on why the
prtctl plugin isn't allowing Apache to dump core?

> in the shell you start Apache from, before you start the server. If this is
> set, Apache will write in the log file that it attempted a core dump and
> where it expects that to go.

>
> S.
>
> --
> Covalent Technologies                 sctemme@covalent.net
> Engineering group                    Voice: (415) 856 4214
> 303 Second Street #375 South           Fax: (415) 856 4210
> San Francisco CA 94107
>
> PGP FP: 7A8D B189 E871 80CB 9521  9320 C11E 7B47 964F 31D9
>
> =======================================================
> This email message is for the sole use of the intended
> recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or
> distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and
> destroy all copies of the original message
> =======================================================
>

Re: Apache 1.3.28 SEGFAULTS and doesn't produce a core file

Posted by Sander Temme <sc...@covalent.net>.
> some reason, Apache keeps SEGFAULT'ing every 4 - 5 days. When apache
> SEGFAULTS, it doesn't produce a core file, and spits out the following
> items in our error_log:

Did you set your ulimit -S -c to a value other than 0? You need to do that
in the shell you start Apache from, before you start the server. If this is
set, Apache will write in the log file that it attempted a core dump and
where it expects that to go.

S.

-- 
Covalent Technologies                 sctemme@covalent.net
Engineering group                    Voice: (415) 856 4214
303 Second Street #375 South           Fax: (415) 856 4210
San Francisco CA 94107

PGP FP: 7A8D B189 E871 80CB 9521  9320 C11E 7B47 964F 31D9

=======================================================
This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and
destroy all copies of the original message
=======================================================