You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by Jim Brandt <cb...@buffalo.edu> on 2003/04/11 19:25:20 UTC

Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While  
doing some load tests on a script, we've seen intermittent seg faults  
and bus errors. These only appear under load:

[Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal  
Segmentation Fault (11)
[Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus  
Error (10)

In addition to mod_perl, the only non-standard module we have installed  
is mod_auth_dce (at version 3.2). The script is using the following  
modules:

use CGI qw(:standard);
use URI::Escape;
use DBI;
use Date::Calc;
use Template;
use Cache::FileCache;
use IO::Socket;

At first we thought Apache::DBI might be the culprit and we saw the  
following in the debug logs:

12729 Apache::DBI             push PerlCleanupHandler
12729 Apache::DBI             need ping: yes
12729 Apache::DBI             new connect to  
XXXXXX#PrintError=1AutoCommit=0RaiseError=1'
12729 Apache::DBI             disconnect (overloaded)
---> When a server crashed it stopped here. <---
12729 Apache::DBI             PerlCleanupHandler
12729 Apache::DBI             PerlCleanupHandler rollback for

We did some tests without DBI and determined (we think) that DBI isn't  
the problem.

We were able to catch some processes with truss. On a seg fault  
response, the truss looked like this:

.....
.....
10300:  write(3, " D a t a b a s e   O K !", 12)        = 12
10300:  write(3, " < / P >", 4)                         = 4 <--last  
item on page
10300:  time()                                          = 1049901880
10300:  stat64("/tmp/FileCache/webreg/1/f/b/", 0x001E2610) = 0
10300:
open64("/tmp/FileCache/webreg/1/f/b/ 
1fbefee9cfb86926757519555e077fd6a21aef0f.tmp
10300", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 9
10300:  fstat64(9, 0x001E2738)                          = 0
10300:  fcntl(9, F_SETFD, 0x00000001)                   = 0
10300:  fstat64(9, 0xFFBEF398)                          = 0
10300:  ioctl(9, TCGETA, 0xFFBEF324)                    Err#25 ENOTTY
10300:  write(9, "040404 4 3 2 1040404\b02".., 151)     = 151
10300:  close(9)                                        = 0
10300:
rename("/tmp/FileCache/webreg/1/f/b/ 
1fbefee9cfb86926757519555e077fd6a21aef0f.tmp
10300",
"/tmp/FileCache/webreg/1/f/b/1fbefee9cfb86926757519555e077fd6a21aef0f")
= 0
10300:  getcontext(0xFFBEF0A8)
10300:  write(2, " 1 0 3 0 0   A p a c h e".., 55)      = 55
10300:  chdir("/")                                      = 0
10300:      Incurred fault #5, FLTACCESS  %pc = 0x0011FBAC
10300:        siginfo: SIGBUS BUS_ADRALN addr=0x7061637B
10300:      Received signal #10, SIGBUS [caught]
10300:        siginfo: SIGBUS BUS_ADRALN addr=0x7061637B
10300:  sigaction(SIGBUS, 0xFFBEF048, 0x00000000)       = 0
10300:  sigprocmask(SIG_SETMASK, 0x7FBCEFF0, 0x00000000) = 0
10300:  chdir("/usr/local/apache/cores")                = 0
10300:  sigaction(SIGBUS, 0xFFBEEEC8, 0xFFBEEFC8)       = 0
10300:  getpid()                                        = 10300
[10288]
10300:  getpid()                                        = 10300
[10288]
10300:  kill(10300, SIGBUS)                             = 0
10300:  signotifywait()                                 = 10
10300:  sigprocmask(SIG_SETMASK, 0x7FBDAD70, 0x00000000) = 0
10300:  sigprocmask(SIG_SETMASK, 0x7FBDAD70, 0xFFBEF038) = 0
10300:  lwp_sigredirect(1, SIGBUS, 0xFFBEEFB4)          = 0
10300:  lwp_sigredirect(1, SIGBUS, 0xFFBEEFB4)          Err#22 EINVAL
10300:  sigprocmask(SIG_SETMASK, 0xFFBEF038, 0x00000000) = 0
10300:  setcontext(0xFFBEEF28)
10300:      Received signal #10, SIGBUS [default]
10300:        siginfo: SIGBUS pid=10300 uid=60001
10300:          *** process killed ***

These lines above appear to be where the fault occurs:

10300:  chdir("/")                                      = 0
10300:      Incurred fault #5, FLTACCESS  %pc = 0x0011FBAC

On successful requests, the truss shows the following after the chdir  
statement:

.....
10296:  chdir("/")                                      = 0
10296:  time()                                          = 1049902947
10296:  write(19, " 1 2 8 . 2 0 5 . 1 9 8 .".., 160)    = 160
10296:  shutdown(3, 1, 1)                               Err#134 ENOTCONN
10296:  close(3)                                        = 0
10296:  sigaction(SIGUSR1, 0xFFBEFA28, 0xFFBEFB28)      = 0
10296:  getcontext(0xFFBEF5F0)
10296:  getcontext(0xFFBEF228)
10296:  getcontext(0xFFBEF580)
10296:  write(2, " 1 0 2 9 6   A p a c h e".., 50)      = 50
10296:  write(6, "         - >   F E T C H".., 73)      = 73
.....

We tried to use truss -u to get some extra info, but we can't get the  
process to seg fault when it's running because it slows the machine too  
much. The truss -u for a successful process shows the following between  
the chdir and the time command:

......
20209/1:        -> libc_psr:memcmp(0x2c8068, 0xffbef408, 0x13,
0xffbef41b)
20209/1:        <- libc_psr:memcmp() = 0
20209/1:        -> libc_psr:memcmp(0x3d6f18, 0x6488c1, 0xc, 0x6488cd)
20209/1:        <- libc_psr:memcmp() = 0
20209/1:        -> libc_psr:memcpy(0xffbef408, 0x648878, 0x6, 0x1)
20209/1:        <- libc_psr:memcpy() = 0xffbef408
20209/1:        -> libc_psr:memcmp(0x1e4358, 0xffbef408, 0x8,
0xffbef410)
20209/1:        <- libc_psr:memcmp() = 0
20209/1:        -> libc_psr:memcpy(0xffbef408, 0x648880, 0x23, 0x1)
...for thousands of lines.....

then it lists these, I think these are related to mod_auth_dce:

0209/1:        -> libc:pthread_mutex_trylock(0x7fabd2f0, 0x7fabd2f0,
0x6d6, 0x1a5e)
20209/1:        -> libthread:pthread_mutex_trylock(0x7fabd2f0,
0x7fabd2f0, 0x6d6 , 0x1a5e)
20209/1:        <- libc:pthread_mutex_trylock() = 0
20209/1:        -> libc_psr:memcmp(0x7fabd308, 0x30a0c0, 0x10, 0x1a5e)
20209/1:        <- libc_psr:memcmp() = 0
20209/1:        -> libc:pthread_mutex_unlock(0x7fabd2f0, 0x7fabd2f0,  
0x6d6,
0x1a 5e)
20209/1:        -> libthread:pthread_mutex_unlock(0x7fabd2f0,  
0x7fabd2f0,
0x6d6, 0x1a5e)
20209/1:        <- libc:pthread_mutex_unlock() = 0
20209/1:        -> libdce:resetpag(0x1d28f0, 0x58, 0xf, 0xe)
20209/1:          -> libc:___errno(0x0, 0x0, 0x0, 0x0)
20209/1:            -> libthread:thr_main(0x0, 0x26028, 0x0, 0x0)
20209/1:            <- libthread:thr_main() = 1
20209/1:          <- libc:___errno() = 0x1e2f0c
20209/1:        <- libdce:resetpag() = -1
20209/1:        -> libc:strlen(0x305890, 0x305890, 0xa500000, 0x0)
20209/1:        <- libc:strlen() = 14
20209/1:        -> libc:strlen(0x3058a0, 0x305820, 0x1, 0x0)
20209/1:        <- libc:strlen() = 6
20209/1:        -> libc:strlen(0x3058a0, 0x0, 0x0, 0x0)
20209/1:        <- libc:strlen() = 6
20209/1:        -> libc:time(0x0, 0x0, 0x3058a5, 0x31)
20209:  time()                                          = 1049918153

The last bit seems to be DCE related, but I'm not sure about the memory  
statements.

Does anyone know what's going on at this point in the cycle? Given that  
these appear to be at the end of the child's cycle, is this a mod_perl  
or apache problem (i.e., should I post to the apache list)?

We're working on getting a core dump file.

Thanks for any insight.

Jim Brandt

==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Jim Brandt <cb...@buffalo.edu>.

On Tuesday, April 15, 2003, at 07:26  PM, Stas Bekman wrote:

> So you can see that the problem happens on the perl side. However 
> since your perl was compiled without -Doptimize='-g' the trace doesn't 
> show the arguments, so it's hard to tell what was the cause.

We're still plugging away at this. I've requested that Perl be 
recompiled as you noted above (I don't support the system in question 
myself).

In the meantime, I was able to reproduce the error on another computer. 
It's a different model Sun box also running Solaris 8 and the seg fault 
came up while running a load test with the same code.

I'll post with a more detailed backtrace from the original 
configuration as soon as I get it.

Thanks for your help thus far.
Jim

==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Stas Bekman <st...@stason.org>.

Jim Brandt wrote:
> 
>> was this process any different from the one documented here?
>> http://perl.apache.org/docs/2.0/devel/debug/ 
>> c.html#Obtaining_core_Files_under_Solaris
>>
> 
> I think a blurb like this would help:

Thanks Jim, I've committed it.

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Jim Brandt <cb...@buffalo.edu>.

> was this process any different from the one documented here?
> http://perl.apache.org/docs/2.0/devel/debug/ 
> c.html#Obtaining_core_Files_under_Solaris
>

I think a blurb like this would help:

By default, Solaris 8 won't allow a setuid process to write a core file  
to the file system. Since apache starts as root and spawns children as  
'nobody', core dumps won't produce core files unless you modify the  
system settings.

To see the current settings, run the coreadm command with no parameters  
and you'll see:

%coreadm
      global core file pattern:
        init core file pattern: core
             global core dumps: disabled
        per-process core dumps: enabled
       global setid core dumps: disabled
  per-process setid core dumps: disabled
      global core dump logging: disabled

These settings are stored in the /etc/coreadm.conf file, but you should  
set them with the coreadm utility. As super-user, you can run coreadm  
with -g to set the pattern and path for core files (you can use a few  
variables here) and -e to enable some of the disabled items. After  
setting a new pattern, enabling global, global-setid, and log, and  
rebooting the system (reboot is required), the new settings look like:

%coreadm
      global core file pattern: /usr/local/apache/cores/core.%f.%p
        init core file pattern: core
             global core dumps: enabled
        per-process core dumps: enabled
       global setid core dumps: enabled
  per-process setid core dumps: disabled
      global core dump logging: enabled

Now you'll start to see core files in the designated cores directory  
and they will look like core.httpd.2222 where httpd is the name of the  
executable and the 2222 is the process id. The new core files will be  
read/write for root only to maintain some security, and you should  
probably do this on development systems only.

==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Stas Bekman <st...@stason.org>.

Jim Brandt wrote:
>> Rather than guessing what could be the problem, why not get a core 
>> file and look at its trace?
> 
> 
> After a bit of playing with the system settings, we were able to get 
> some core files. We had to use the coreadm utility on Solaris to allow a 
> setuid process to leave a core file.

was this process any different from the one documented here?
http://perl.apache.org/docs/2.0/devel/debug/c.html#Obtaining_core_Files_under_Solaris

> Anyway, here are the results of one of the core dumps from gdb:
> 
> #0  0x00139f38 in Perl_op_const_sv ()
> #1  0x0016b388 in Perl_sv_setsv ()
> #2  0x0015e1fc in Perl_pp_sassign ()
> #3  0x0015dc80 in Perl_runops_standard ()
> #4  0x0010dfc8 in S_call_body ()
> #5  0x0010dcc8 in perl_call_sv ()
> #6  0x0010d7e0 in perl_call_method ()

So you can see that the problem happens on the perl side. However since your 
perl was compiled without -Doptimize='-g' the trace doesn't show the 
arguments, so it's hard to tell what was the cause.

> #7  0x0003204c in perl_call_handler (sv=0x36cee4, r=0x392040, args=0x0) 
> at mod_perl.c:1655
> #8  0x000310a0 in perl_run_stacked_handlers (hook=0x1b55b8 
> "PerlHandler", r=0x392040, handlers=0x36cf50) at mod_perl.c:1371
> #9  0x0002d620 in perl_handler (r=0x392040) at mod_perl.c:897
> #10 0x000a787c in ap_invoke_handler (r=0x392040) at http_config.c:518
> #11 0x000c6970 in process_request_internal (r=0x392040) at 
> http_request.c:1308
> #12 0x000c69f4 in ap_process_request (r=0x392040) at http_request.c:1324
> #13 0x000b95b4 in child_main (child_num_arg=3) at http_main.c:4603
> #14 0x000b9940 in make_child (s=0x201840, slot=3, now=1050426606) at 
> http_main.c:4773
> #15 0x000b9a64 in startup_children (number_to_start=12) at http_main.c:4800
> #16 0x000ba494 in standalone_main (argc=1, argv=0xffbefb04) at 
> http_main.c:5108
> #17 0x000bb0f0 in main (argc=1, argv=0xffbefb04) at http_main.c:5456

__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Jim Brandt <cb...@buffalo.edu>.

> Rather than guessing what could be the problem, why not get a core 
> file and look at its trace?

After a bit of playing with the system settings, we were able to get 
some core files. We had to use the coreadm utility on Solaris to allow 
a setuid process to leave a core file.

Anyway, here are the results of one of the core dumps from gdb:

#0  0x00139f38 in Perl_op_const_sv ()
#1  0x0016b388 in Perl_sv_setsv ()
#2  0x0015e1fc in Perl_pp_sassign ()
#3  0x0015dc80 in Perl_runops_standard ()
#4  0x0010dfc8 in S_call_body ()
#5  0x0010dcc8 in perl_call_sv ()
#6  0x0010d7e0 in perl_call_method ()
#7  0x0003204c in perl_call_handler (sv=0x36cee4, r=0x392040, args=0x0) 
at mod_perl.c:1655
#8  0x000310a0 in perl_run_stacked_handlers (hook=0x1b55b8 
"PerlHandler", r=0x392040, handlers=0x36cf50) at mod_perl.c:1371
#9  0x0002d620 in perl_handler (r=0x392040) at mod_perl.c:897
#10 0x000a787c in ap_invoke_handler (r=0x392040) at http_config.c:518
#11 0x000c6970 in process_request_internal (r=0x392040) at 
http_request.c:1308
#12 0x000c69f4 in ap_process_request (r=0x392040) at http_request.c:1324
#13 0x000b95b4 in child_main (child_num_arg=3) at http_main.c:4603
#14 0x000b9940 in make_child (s=0x201840, slot=3, now=1050426606) at 
http_main.c:4773
#15 0x000b9a64 in startup_children (number_to_start=12) at 
http_main.c:4800
#16 0x000ba494 in standalone_main (argc=1, argv=0xffbefb04) at 
http_main.c:5108
#17 0x000bb0f0 in main (argc=1, argv=0xffbefb04) at http_main.c:5456


==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Stas Bekman <st...@stason.org>.

Jim Brandt wrote:
> I did some additional testing on Mac OS X with Perl 5.8. I didn't see 
> the Seg faults, but I did notice something that was the same as on Solaris.
> 
> When running under Apache::PerlRun (these problems are all under 
> Apache::PerlRun) on both systems, I also get "subroutine xxx redefined" 
> warnings in the error log for scripts with even the simplest 
> subroutines. These seem to appear as each child servers its second 
> request. This happens when the sub in in the bottom of the main program 
> or require'd from a separate file.
> 
> I added:
> 
>  PerlSetVar PerlRunOnce On
> 
> and the "subroutine redefined" messages went away, but with a massive 
> performance hit.
> 
> Is it possible that I set up two faulty mod_perl configurations such 
> that PerlRun isn't flushing the namespaces properly? If this is the 
> case, could it be that the memory errors are a side effect of this?

Rather than guessing what could be the problem, why not get a core file and 
look at its trace?

> On Friday, April 11, 2003, at 01:25  PM, Jim Brandt wrote:
> 
>> We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While 
>> doing some load tests on a script, we've seen intermittent seg faults 
>> and bus errors. These only appear under load:
>>
>> [Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal 
>> Segmentation Fault (11)
>> [Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus 
>> Error (10)
> 
> 
> 
> ==========================================
> Jim Brandt
> Administrative Computing Services
> University at Buffalo


-- 


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:stas@stason.org http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Apache/1.3.26 (Unix) mod_perl/1.27 seg faults

Posted by Jim Brandt <cb...@buffalo.edu>.

I did some additional testing on Mac OS X with Perl 5.8. I didn't see 
the Seg faults, but I did notice something that was the same as on 
Solaris.

When running under Apache::PerlRun (these problems are all under 
Apache::PerlRun) on both systems, I also get "subroutine xxx redefined" 
warnings in the error log for scripts with even the simplest 
subroutines. These seem to appear as each child servers its second 
request. This happens when the sub in in the bottom of the main program 
or require'd from a separate file.

I added:

  PerlSetVar PerlRunOnce On

and the "subroutine redefined" messages went away, but with a massive 
performance hit.

Is it possible that I set up two faulty mod_perl configurations such 
that PerlRun isn't flushing the namespaces properly? If this is the 
case, could it be that the memory errors are a side effect of this?

On Friday, April 11, 2003, at 01:25  PM, Jim Brandt wrote:

> We're running mod_perl with apache on Solaris 8 with Perl 5.6.1. While 
> doing some load tests on a script, we've seen intermittent seg faults 
> and bus errors. These only appear under load:
>
> [Thu Apr 10 15:09:06 2003] [notice] child pid 11276 exit signal 
> Segmentation Fault (11)
> [Thu Apr 10 15:09:49 2003] [notice] child pid 11265 exit signal Bus 
> Error (10)

==========================================
Jim Brandt
Administrative Computing Services
University at Buffalo