You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2004/08/12 20:00:19 UTC

DO NOT REPLY [Bug 30627] New: - possible bug in handling ALARM signals on Solaris 9

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=30627>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=30627

possible bug in handling ALARM signals on Solaris 9

           Summary: possible bug in handling ALARM signals on Solaris 9
           Product: Apache httpd-1.3
           Version: 1.3.31
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: Major
          Priority: Other
         Component: core
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: jmurphy@buffalo.edu


Our central campus web server has been having stability issues 
every since we've upgraded to Solaris 9.  Initially, we have just
copied the apache binaries from the previous installation, but
we've recently rebuilt using apache 1.3.31 and see the same behavior.

In general, we see two problems, which I think have the same cause:

1. Apache servers will occationally fail to acquire an fcntl accept
   lock, causing the server to exit.
2. Apache servers occationally segfault.

We tried moving the location of the lockfile and the type of the lockfile
without any luck.

After trussing the apache servers during these two different problems,
I noticed that in both cases, immediately before the segfault or EDEADLK,
apache recieves an ALARM signal interupting an lwp_park system call.  Normally
the ALARMs just come in during read/writes from what I can see.

Anyways, it seems that the ALARM is received in a thread other than lwp#1 
which seems to handle the main loop.

In the following trace, apache is clearing working in LWP#1, but after
an ALARM signal is received inside lwp_park, control seems to go to 
a different thread, with unexpected results:

/1:     poll(0xFFBFF8B8, 1, 0)                          = 0
/1:     write(7, " H T T P / 1 . 1   3 0 4".., 222)     = 222
/1:     door_info(4, 0xFFBFD5E0)                        = 0
/1:     door_call(4, 0xFFBFD5C8)                        = 0
/1:     time()                                          = 1092327277
/1:     write(6, " u b - c o u n s e l i n".., 207)     = 207
/1:     times(0x7EAC09CC)                               = 14875555
/1:     llseek(8, 0, SEEK_CUR)                          = 0
/1:     close(8)                                        = 0
/1:     sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70)      = 0
/1:     read(7, 0x004E1CF0, 4096)       (sleeping...)   
/203:   lwp_park(0x7F71FC98, 0)                         Err#62 ETIME
/203:   lwp_park(0x7F71FC98, 0)         (sleeping...)   
/203:       Received signal #14, SIGALRM, in lwp_park() [caught]
/203:   lwp_park(0x7F71FC98, 0)                         Err#4 EINTR
/203:   sigprocmask(SIG_SETMASK, 0x7F71F7DC, 0x00000000) = 0
/1:     read(7, 0x004E1CF0, 4096)                       Err#9 EBADF
/203:   close(7)                                        = 0
/203:   getcontext(0x7F71F538)                          
/203:   sigprocmask(SIG_SETMASK, 0x7F83A074, 0x7F71F300) = 0
/203:   lwp_unpark(203, 1)                              = 0
/203:   setcontext(0x7F71F310)                          
/1:     time()                                          = 1092327294
/1:     close(-1)                                       Err#9 EBADF
/1:     sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70)      = 0
/203:   sigaction(SIGALRM, 0xFFBFF950, 0xFFBFFA70)      = 0
/203:   sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70)      = 0
/203:   fcntl(21, F_SETLKW, 0x004B444C)                 Err#45 EDEADLK
/203:   time()                                          = 1092327294
/203:   write(15, " [ T h u   A u g   1 2  ".., 229)    = 229
/203:   sigaction(SIGHUP, 0xFFBFF890, 0xFFBFF9B0)       = 0
/203:   sigaction(SIGUSR1, 0xFFBFF890, 0xFFBFF9B0)      = 0
/203:   lwp_mutex_lock(0x7F838A00)                      = 0
/203:   write(1, " L a u n c h i n g . . .".., 48)      = 48
/203:   _exit(15)


here is another trace of the deadlock where I just watched open/close/fcntl:


/1:     close(8)                                        = 0
/757:       Received signal #14, SIGALRM, in lwp_park() [caught]
/757:   close(7)                                        = 0
/1:     fcntl(21, F_SETLKW, 0x004B4428)                 = 0
/1:     fcntl(7, F_SETFD, 0x00000001)                   = 0
/1:     fcntl(7, F_GETFL, 0x00000000)                   = 130
/1:     fcntl(7, F_SETFL, 0x00000002)                   = 0
/1:     open("/info/www/.htaccess", O_RDONLY)           Err#2 ENOENT
...stuff deleted...
/1:     close(40)                                       = 0
/1:     close(8)                                        = 0
/1:     close(7)                                        = 0
/1:     fcntl(21, F_SETLKW, 0x004B444C)                 Err#45 EDEADLK


and here is a trace of the same sort of signal handling, resulting in
a segfault: (this one is very odd in that it seems two threads are
trying to execute the same code concurrently)

/1:     close(8)                                        = 0
/1:     close(58)                                       = 0
/1:     close(56)                                       = 0
/1:     close(45)                                       = 0
/194:       Received signal #14, SIGALRM, in lwp_park() [caught]
/1:     close(7)                                        Err#9 EBADF
/194:   close(7)                                        = 0
/1:     fcntl(21, F_SETLKW, 0x004B444C) (sleeping...)
/194:   fcntl(21, F_SETLKW, 0x004B444C) (sleeping...)
/194:   fcntl(21, F_SETLKW, 0x004B444C)                 = 0
/1:     fcntl(21, F_SETLKW, 0x004B444C)                 = 0
/1:     fcntl(21, F_SETLKW, 0x004B4428)                 = 0
/1:     fcntl(7, F_SETFD, 0x00000001)                   = 0
/1:     fcntl(7, F_GETFL, 0x00000000)                   = 130
/194:       Incurred fault #6, FLTBOUNDS  %pc = 0x7F952540
/194:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
/194:       Received signal #11, SIGSEGV [caught]
/194:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
/1:     fcntl(7, F_SETFL, 0x00000002)                   = 0
/1:         Received signal #11, SIGSEGV [default]
/1:           siginfo: SIGSEGV pid=16796 uid=60001
/194:       Incurred fault #6, FLTBOUNDS  %pc = 0x7F952540
/194:         siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000

We're running apache 1.3.31 with a bunch of modules.  Since this
appears to be a race condition that we only see under load and we
can't disable modules on our production server, I haven't tested disabling
individual modules.

mod_auth_dce uses threads, but we haven't seen this problem before when
using mod_auth_dce.  php4, fastcgi and apache-ssl are also being used.

Anyways, the reason I feel this is related to ALARM handling on solaris 9
is because of a note in the solaris 9 developer docs:

http://docs.sun.com/db/doc/806-6867/6jfpgdcnt?q=alarm&a=view

>Effective with the Solaris 9 Operating Environment, calls to alarm() or to 
>setitimer(ITIMER_REAL) will cause the resulting SIGALRM signal to be sent to 
>the process.


some info on our server:

> /usr/local/apache/httpsd -V
Server version: Apache/1.3.31 Ben-SSL/1.55 (Unix)
Server built:   Aug  4 2004 10:28:40
Server's Module Magic Number: 19990320:16
Server compiled with....
 -D HAVE_MMAP
 -D USE_MMAP_SCOREBOARD
 -D USE_MMAP_FILES
 -D NO_WRITEV
 -D HAVE_FCNTL_SERIALIZED_ACCEPT
 -D HAVE_SYSVSEM_SERIALIZED_ACCEPT
 -D HAVE_PTHREAD_SERIALIZED_ACCEPT
 -D DYNAMIC_MODULE_LIMIT=64
 -D HARD_SERVER_LIMIT=1024
 -D HTTPD_ROOT="/usr/local/apache"
 -D SUEXEC_BIN="/usr/local/apache/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_LOCKFILE="logs/accept.lock"
 -D DEFAULT_ERRORLOG="logs/error_log"
 -D TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"
 -D ACCESS_CONFIG_FILE="conf/access.conf"
 -D RESOURCE_CONFIG_FILE="conf/srm.conf"
> /usr/local/apache/httpsd -l
Compiled-in modules:
  http_core.c
  mod_php4.c
  mod_env.c
  mod_log_config.c
  mod_mime_magic.c
  mod_mime.c
  mod_negotiation.c
  mod_status.c
  mod_info.c
  mod_include.c
  mod_autoindex.c
  mod_dir.c
  mod_cgi.c
  mod_fastcgi.c
  mod_asis.c
  mod_imap.c
  mod_actions.c
  mod_speling.c
  mod_userdir.c
  mod_alias.c
  mod_rewrite.c
  mod_access.c
  mod_auth_dce.c
  mod_auth.c
  mod_expires.c
  mod_headers.c
  mod_setenvif.c
  apache_ssl.c
suexec: disabled; invalid wrapper /usr/local/apache/bin/suexec
>

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org