You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Jim Brandt <cb...@buffalo.edu> on 2003/04/11 19:25:20 UTC
Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While
doing some load tests on a script, we've seen intermittent seg faults
and bus errors. These only appear under load:
[Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal
Segmentation Fault (11)
[Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus
Error (10)
In addition to mod_perl, the only non-standard module we have installed
is mod_auth_dce (at version 3.2). The script is using the following
modules:
use CGI qw(:standard);
use URI::Escape;
use DBI;
use Date::Calc;
use Template;
use Cache::FileCache;
use IO::Socket;
At first we thought Apache::DBI might be the culprit and we saw the
following in the debug logs:
12729 Apache::DBI push PerlCleanupHandler
12729 Apache::DBI need ping: yes
12729 Apache::DBI new connect to
XXXXXX#PrintError=1AutoCommit=0RaiseError=1'
12729 Apache::DBI disconnect (overloaded)
---> When a server crashed it stopped here. <---
12729 Apache::DBI PerlCleanupHandler
12729 Apache::DBI PerlCleanupHandler rollback for
We did some tests without DBI and determined (we think) that DBI isn't
the problem.
We were able to catch some processes with truss. On a seg fault
response, the truss looked like this:
.....
.....
10300: write(3, " D a t a b a s e O K !", 12) = 12
10300: write(3, " < / P >", 4) = 4 <--last
item on page
10300: time() = 1049901880
10300: stat64("/tmp/FileCache/webreg/1/f/b/", 0x001E2610) = 0
10300:
open64("/tmp/FileCache/webreg/1/f/b/
1fbefee9cfb86926757519555e077fd6a21aef0f.tmp
10300", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 9
10300: fstat64(9, 0x001E2738) = 0
10300: fcntl(9, F_SETFD, 0x00000001) = 0
10300: fstat64(9, 0xFFBEF398) = 0
10300: ioctl(9, TCGETA, 0xFFBEF324) Err#25 ENOTTY
10300: write(9, "040404 4 3 2 1040404\b02".., 151) = 151
10300: close(9) = 0
10300:
rename("/tmp/FileCache/webreg/1/f/b/
1fbefee9cfb86926757519555e077fd6a21aef0f.tmp
10300",
"/tmp/FileCache/webreg/1/f/b/1fbefee9cfb86926757519555e077fd6a21aef0f")
= 0
10300: getcontext(0xFFBEF0A8)
10300: write(2, " 1 0 3 0 0 A p a c h e".., 55) = 55
10300: chdir("/") = 0
10300: Incurred fault #5, FLTACCESS %pc = 0x0011FBAC
10300: siginfo: SIGBUS BUS_ADRALN addr=0x7061637B
10300: Received signal #10, SIGBUS [caught]
10300: siginfo: SIGBUS BUS_ADRALN addr=0x7061637B
10300: sigaction(SIGBUS, 0xFFBEF048, 0x00000000) = 0
10300: sigprocmask(SIG_SETMASK, 0x7FBCEFF0, 0x00000000) = 0
10300: chdir("/usr/local/apache/cores") = 0
10300: sigaction(SIGBUS, 0xFFBEEEC8, 0xFFBEEFC8) = 0
10300: getpid() = 10300
[10288]
10300: getpid() = 10300
[10288]
10300: kill(10300, SIGBUS) = 0
10300: signotifywait() = 10
10300: sigprocmask(SIG_SETMASK, 0x7FBDAD70, 0x00000000) = 0
10300: sigprocmask(SIG_SETMASK, 0x7FBDAD70, 0xFFBEF038) = 0
10300: lwp_sigredirect(1, SIGBUS, 0xFFBEEFB4) = 0
10300: lwp_sigredirect(1, SIGBUS, 0xFFBEEFB4) Err#22 EINVAL
10300: sigprocmask(SIG_SETMASK, 0xFFBEF038, 0x00000000) = 0
10300: setcontext(0xFFBEEF28)
10300: Received signal #10, SIGBUS [default]
10300: siginfo: SIGBUS pid=10300 uid=60001
10300: *** process killed ***
These lines above appear to be where the fault occurs:
10300: chdir("/") = 0
10300: Incurred fault #5, FLTACCESS %pc = 0x0011FBAC
On successful requests, the truss shows the following after the chdir
statement:
.....
10296: chdir("/") = 0
10296: time() = 1049902947
10296: write(19, " 1 2 8 . 2 0 5 . 1 9 8 .".., 160) = 160
10296: shutdown(3, 1, 1) Err#134 ENOTCONN
10296: close(3) = 0
10296: sigaction(SIGUSR1, 0xFFBEFA28, 0xFFBEFB28) = 0
10296: getcontext(0xFFBEF5F0)
10296: getcontext(0xFFBEF228)
10296: getcontext(0xFFBEF580)
10296: write(2, " 1 0 2 9 6 A p a c h e".., 50) = 50
10296: write(6, " - > F E T C H".., 73) = 73
.....
We tried to use truss -u to get some extra info, but we can't get the
process to seg fault when it's running because it slows the machine too
much. The truss -u for a successful process shows the following between
the chdir and the time command:
......
20209/1: -> libc_psr:memcmp(0x2c8068, 0xffbef408, 0x13,
0xffbef41b)
20209/1: <- libc_psr:memcmp() = 0
20209/1: -> libc_psr:memcmp(0x3d6f18, 0x6488c1, 0xc, 0x6488cd)
20209/1: <- libc_psr:memcmp() = 0
20209/1: -> libc_psr:memcpy(0xffbef408, 0x648878, 0x6, 0x1)
20209/1: <- libc_psr:memcpy() = 0xffbef408
20209/1: -> libc_psr:memcmp(0x1e4358, 0xffbef408, 0x8,
0xffbef410)
20209/1: <- libc_psr:memcmp() = 0
20209/1: -> libc_psr:memcpy(0xffbef408, 0x648880, 0x23, 0x1)
...for thousands of lines.....
then it lists these, I think these are related to mod_auth_dce:
0209/1: -> libc:pthread_mutex_trylock(0x7fabd2f0, 0x7fabd2f0,
0x6d6, 0x1a5e)
20209/1: -> libthread:pthread_mutex_trylock(0x7fabd2f0,
0x7fabd2f0, 0x6d6 , 0x1a5e)
20209/1: <- libc:pthread_mutex_trylock() = 0
20209/1: -> libc_psr:memcmp(0x7fabd308, 0x30a0c0, 0x10, 0x1a5e)
20209/1: <- libc_psr:memcmp() = 0
20209/1: -> libc:pthread_mutex_unlock(0x7fabd2f0, 0x7fabd2f0,
0x6d6,
0x1a 5e)
20209/1: -> libthread:pthread_mutex_unlock(0x7fabd2f0,
0x7fabd2f0,
0x6d6, 0x1a5e)
20209/1: <- libc:pthread_mutex_unlock() = 0
20209/1: -> libdce:resetpag(0x1d28f0, 0x58, 0xf, 0xe)
20209/1: -> libc:___errno(0x0, 0x0, 0x0, 0x0)
20209/1: -> libthread:thr_main(0x0, 0x26028, 0x0, 0x0)
20209/1: <- libthread:thr_main() = 1
20209/1: <- libc:___errno() = 0x1e2f0c
20209/1: <- libdce:resetpag() = -1
20209/1: -> libc:strlen(0x305890, 0x305890, 0xa500000, 0x0)
20209/1: <- libc:strlen() = 14
20209/1: -> libc:strlen(0x3058a0, 0x305820, 0x1, 0x0)
20209/1: <- libc:strlen() = 6
20209/1: -> libc:strlen(0x3058a0, 0x0, 0x0, 0x0)
20209/1: <- libc:strlen() = 6
20209/1: -> libc:time(0x0, 0x0, 0x3058a5, 0x31)
20209: time() = 1049918153
The last bit seems to be DCE related, but I'm not sure about the memory
statements.
Does anyone know what's going on at this point in the cycle? Given that
these appear to be at the end of the child's cycle, is this a mod_perl
or apache problem (i.e., should I post to the apache list)?
We're working on getting a core dump file.
Thanks for any insight.
Jim Brandt
==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Jim Brandt <cb...@buffalo.edu>.
On Tuesday, April 15, 2003, at 07:26 PM, Stas Bekman wrote:
> So you can see that the problem happens on the perl side. However
> since your perl was compiled without -Doptimize='-g' the trace doesn't
> show the arguments, so it's hard to tell what was the cause.
We're still plugging away at this. I've requested that Perl be
recompiled as you noted above (I don't support the system in question
myself).
In the meantime, I was able to reproduce the error on another computer.
It's a different model Sun box also running Solaris 8 and the seg fault
came up while running a load test with the same code.
I'll post with a more detailed backtrace from the original
configuration as soon as I get it.
Thanks for your help thus far.
Jim
==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Stas Bekman <st...@stason.org>.
Jim Brandt wrote:
>
>> was this process any different from the one documented here?
>> http://perl.apache.org/docs/2.0/devel/debug/
>> c.html#Obtaining_core_Files_under_Solaris
>>
>
> I think a blurb like this would help:
Thanks Jim, I've committed it.
__________________________________________________________________
Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Jim Brandt <cb...@buffalo.edu>.
> was this process any different from the one documented here?
> http://perl.apache.org/docs/2.0/devel/debug/
> c.html#Obtaining_core_Files_under_Solaris
>
I think a blurb like this would help:
By default, Solaris 8 won't allow a setuid process to write a core file
to the file system. Since apache starts as root and spawns children as
'nobody', core dumps won't produce core files unless you modify the
system settings.
To see the current settings, run the coreadm command with no parameters
and you'll see:
%coreadm
global core file pattern:
init core file pattern: core
global core dumps: disabled
per-process core dumps: enabled
global setid core dumps: disabled
per-process setid core dumps: disabled
global core dump logging: disabled
These settings are stored in the /etc/coreadm.conf file, but you should
set them with the coreadm utility. As super-user, you can run coreadm
with -g to set the pattern and path for core files (you can use a few
variables here) and -e to enable some of the disabled items. After
setting a new pattern, enabling global, global-setid, and log, and
rebooting the system (reboot is required), the new settings look like:
%coreadm
global core file pattern: /usr/local/apache/cores/core.%f.%p
init core file pattern: core
global core dumps: enabled
per-process core dumps: enabled
global setid core dumps: enabled
per-process setid core dumps: disabled
global core dump logging: enabled
Now you'll start to see core files in the designated cores directory
and they will look like core.httpd.2222 where httpd is the name of the
executable and the 2222 is the process id. The new core files will be
read/write for root only to maintain some security, and you should
probably do this on development systems only.
==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Stas Bekman <st...@stason.org>.
Jim Brandt wrote:
>> Rather than guessing what could be the problem, why not get a core
>> file and look at its trace?
>
>
> After a bit of playing with the system settings, we were able to get
> some core files. We had to use the coreadm utility on Solaris to allow a
> setuid process to leave a core file.
was this process any different from the one documented here?
http://perl.apache.org/docs/2.0/devel/debug/c.html#Obtaining_core_Files_under_Solaris
> Anyway, here are the results of one of the core dumps from gdb:
>
> #0 0x00139f38 in Perl_op_const_sv ()
> #1 0x0016b388 in Perl_sv_setsv ()
> #2 0x0015e1fc in Perl_pp_sassign ()
> #3 0x0015dc80 in Perl_runops_standard ()
> #4 0x0010dfc8 in S_call_body ()
> #5 0x0010dcc8 in perl_call_sv ()
> #6 0x0010d7e0 in perl_call_method ()
So you can see that the problem happens on the perl side. However since your
perl was compiled without -Doptimize='-g' the trace doesn't show the
arguments, so it's hard to tell what was the cause.
> #7 0x0003204c in perl_call_handler (sv=0x36cee4, r=0x392040, args=0x0)
> at mod_perl.c:1655
> #8 0x000310a0 in perl_run_stacked_handlers (hook=0x1b55b8
> "PerlHandler", r=0x392040, handlers=0x36cf50) at mod_perl.c:1371
> #9 0x0002d620 in perl_handler (r=0x392040) at mod_perl.c:897
> #10 0x000a787c in ap_invoke_handler (r=0x392040) at http_config.c:518
> #11 0x000c6970 in process_request_internal (r=0x392040) at
> http_request.c:1308
> #12 0x000c69f4 in ap_process_request (r=0x392040) at http_request.c:1324
> #13 0x000b95b4 in child_main (child_num_arg=3) at http_main.c:4603
> #14 0x000b9940 in make_child (s=0x201840, slot=3, now=1050426606) at
> http_main.c:4773
> #15 0x000b9a64 in startup_children (number_to_start=12) at http_main.c:4800
> #16 0x000ba494 in standalone_main (argc=1, argv=0xffbefb04) at
> http_main.c:5108
> #17 0x000bb0f0 in main (argc=1, argv=0xffbefb04) at http_main.c:5456
__________________________________________________________________
Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Jim Brandt <cb...@buffalo.edu>.
> Rather than guessing what could be the problem, why not get a core
> file and look at its trace?
After a bit of playing with the system settings, we were able to get
some core files. We had to use the coreadm utility on Solaris to allow
a setuid process to leave a core file.
Anyway, here are the results of one of the core dumps from gdb:
#0 0x00139f38 in Perl_op_const_sv ()
#1 0x0016b388 in Perl_sv_setsv ()
#2 0x0015e1fc in Perl_pp_sassign ()
#3 0x0015dc80 in Perl_runops_standard ()
#4 0x0010dfc8 in S_call_body ()
#5 0x0010dcc8 in perl_call_sv ()
#6 0x0010d7e0 in perl_call_method ()
#7 0x0003204c in perl_call_handler (sv=0x36cee4, r=0x392040, args=0x0)
at mod_perl.c:1655
#8 0x000310a0 in perl_run_stacked_handlers (hook=0x1b55b8
"PerlHandler", r=0x392040, handlers=0x36cf50) at mod_perl.c:1371
#9 0x0002d620 in perl_handler (r=0x392040) at mod_perl.c:897
#10 0x000a787c in ap_invoke_handler (r=0x392040) at http_config.c:518
#11 0x000c6970 in process_request_internal (r=0x392040) at
http_request.c:1308
#12 0x000c69f4 in ap_process_request (r=0x392040) at http_request.c:1324
#13 0x000b95b4 in child_main (child_num_arg=3) at http_main.c:4603
#14 0x000b9940 in make_child (s=0x201840, slot=3, now=1050426606) at
http_main.c:4773
#15 0x000b9a64 in startup_children (number_to_start=12) at
http_main.c:4800
#16 0x000ba494 in standalone_main (argc=1, argv=0xffbefb04) at
http_main.c:5108
#17 0x000bb0f0 in main (argc=1, argv=0xffbefb04) at http_main.c:5456
==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Stas Bekman <st...@stason.org>.
Jim Brandt wrote:
> I did some additional testing on Mac OS X with Perl 5.8. I didn't see
> the Seg faults, but I did notice something that was the same as on Solaris.
>
> When running under Apache::PerlRun (these problems are all under
> Apache::PerlRun) on both systems, I also get "subroutine xxx redefined"
> warnings in the error log for scripts with even the simplest
> subroutines. These seem to appear as each child servers its second
> request. This happens when the sub in in the bottom of the main program
> or require'd from a separate file.
>
> I added:
>
> PerlSetVar PerlRunOnce On
>
> and the "subroutine redefined" messages went away, but with a massive
> performance hit.
>
> Is it possible that I set up two faulty mod_perl configurations such
> that PerlRun isn't flushing the namespaces properly? If this is the
> case, could it be that the memory errors are a side effect of this?
Rather than guessing what could be the problem, why not get a core file and
look at its trace?
> On Friday, April 11, 2003, at 01:25 PM, Jim Brandt wrote:
>
>> We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While
>> doing some load tests on a script, we've seen intermittent seg faults
>> and bus errors. These only appear under load:
>>
>> [Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal
>> Segmentation Fault (11)
>> [Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus
>> Error (10)
>
>
>
> ==========================================
> Jim Brandt
> Administrative Computing Services
> University at Buffalo
--
__________________________________________________________________
Stas Bekman JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/ mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org http://ticketmaster.com
Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults
Posted by Jim Brandt <cb...@buffalo.edu>.
I did some additional testing on Mac OS X with Perl 5.8. I didn't see
the Seg faults, but I did notice something that was the same as on
Solaris.
When running under Apache::PerlRun (these problems are all under
Apache::PerlRun) on both systems, I also get "subroutine xxx redefined"
warnings in the error log for scripts with even the simplest
subroutines. These seem to appear as each child servers its second
request. This happens when the sub in in the bottom of the main program
or require'd from a separate file.
I added:
PerlSetVar PerlRunOnce On
and the "subroutine redefined" messages went away, but with a massive
performance hit.
Is it possible that I set up two faulty mod_perl configurations such
that PerlRun isn't flushing the namespaces properly? If this is the
case, could it be that the memory errors are a side effect of this?
On Friday, April 11, 2003, at 01:25 PM, Jim Brandt wrote:
> We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While
> doing some load tests on a script, we've seen intermittent seg faults
> and bus errors. These only appear under load:
>
> [Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal
> Segmentation Fault (11)
> [Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus
> Error (10)
==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo