You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by ra...@bellglobal.com on 1998/02/03 21:51:18 UTC

Maybe it's me - more locking problems

The Gods must be against me.  Apparently I don't know how to run an Apache
server at all.  Now I have a 1.2.5 server which will serve up one 
request after a restart and then get stuck in a lock.

The only non-standard thing compiled in right now is mod_jserv, but the
symptoms are the same without that module.

The other non-standard thing, I guess, is that Apache and all of its files
are on an NFS-mounted Netapp filesystem.

The server is a Solaris 2.5.1 box.

Parent process is sitting in this state:

     0.000000 sigsuspend([] <unfinished ...>
     0.415977 --- SIGALRM (Alarm Clock) ---
     0.000214 <... sigsuspend resumed> ) = -1 EINTR (Interrupted system call)
<0.415636>
     0.000272 setcontext({uc_sigmask=[ALRM], ...}) = ? <0.000113>
     0.000525 alarm(0)                  = 0 <0.000083>
     0.000389 sigprocmask(SIG_UNBLOCK, [ALRM], NULL) = 0 <0.000087>
     0.000475 sigaction(SIGALRM, {SIG_DFL}, NULL) = 0 <0.000086>
     0.000464 waitid(P_ALL, 0, {si_signo=0, si_code=SI_USER, si_pid=0,
si_uid=0, ...}, WNOHANG|WEXITED|WTRAPPED) = 0 <0.000091>
     0.000499 alarm(0)                  = 0 <0.000084>
     0.000349 sigaction(SIGALRM, {0xef5b8cfc, [], 0}, {SIG_DFL}) = 0 <0.000094>
     0.000570 sigprocmask(SIG_BLOCK, [ALRM], []) = 0 <0.000090>
     0.000577 alarm(1)                  = 0 <0.000086>
     0.000351 sigsuspend([] <unfinished ...>
     0.995342 --- SIGALRM (Alarm Clock) ---
     0.000184 <... sigsuspend resumed> ) = -1 EINTR (Interrupted system call)
<0.995330>
     0.000248 setcontext({uc_sigmask=[ALRM], ...}) = ? <0.000107>
     0.000517 alarm(0)                  = 0 <0.000083>
     0.000352 sigprocmask(SIG_UNBLOCK, [ALRM], NULL) = 0 <0.000087>
     0.000480 sigaction(SIGALRM, {SIG_DFL}, NULL) = 0 <0.000088>
     0.000454 waitid(P_ALL, 0, {si_signo=0, si_code=SI_USER, si_pid=0,
si_uid=0, ...}, WNOHANG|WEXITED|WTRAPPED) = 0 <0.000090>
     0.000495 alarm(0)                  = 0 <0.000083>
     0.000349 sigaction(SIGALRM, {0xef5b8cfc, [], 0}, {SIG_DFL}) = 0 <0.000094>
     0.000567 sigprocmask(SIG_BLOCK, [ALRM], []) = 0 <0.000089>

Looks normal enough so far.

Child processes are sitting in:

 0.000000 fcntl(21, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}

And they got there by:

 ef5b6fd8 fcntl    (15, 7, 5cfcc)
 ef5b6fd8 _libc_fcntl (15, 7, 5cfcc, 0, 0, 0) + 8
 ef78aba8 s_fcntl  (15, 7, 5cfcc, ef611938, 34, effff944) + 164
 00019138 accept_mutex_on (62d08, 1, 0, effffa78, effffa88, 2) + 18
 0001a874 child_main (6c170, 5d000, 5d000, 5cc00, 5d000, 5b000) + 1f8
 0001ad1c make_child (61880, 2, 61400, 11, 0, 5cfc9) + d4
 0001b3a0 standalone_main (effffb78, 5cf48, 5f400, 5cc00, 75, 5f7b0) + 2a4
 0001b920 main     (3, effffc94, effffca4, 5f3f4, 1, 0) + 2a0
 00017b48 _start   (0, 0, 0, 0, 0, 0) + 5c

Ok, so what are we locked on?

Parent process says:

  Current rlimit: 92 file descriptors
   0: S_IFCHR mode:0620 dev:32,0 ino:162863 uid:100 gid:7 rdev:24,2
      O_RDWR
   1: S_IFCHR mode:0620 dev:32,0 ino:162863 uid:100 gid:7 rdev:24,2
      O_RDWR
   2: S_IFCHR mode:0620 dev:32,0 ino:162863 uid:100 gid:7 rdev:24,2
      O_RDWR
   4: 0xd000  mode:0444 dev:164,0 ino:8952 uid:0 gid:0 size:0
      O_RDONLY close-on-exec
  15: S_IFCHR mode:0000 dev:32,0 ino:26208 uid:0 gid:0 rdev:42,8491
      O_RDWR
  16: S_IFREG mode:0644 dev:162,1 ino:2924135 uid:0 gid:1 size:13387
      O_WRONLY|O_APPEND
  17: S_IFCHR mode:0000 dev:32,0 ino:6664 uid:0 gid:0 rdev:42,8911
      O_RDWR
  18: S_IFREG mode:0644 dev:162,1 ino:4842016 uid:0 gid:1 size:61789
      O_WRONLY|O_APPEND
  19: S_IFREG mode:0644 dev:162,1 ino:4842022 uid:0 gid:1 size:4551030
      O_WRONLY|O_APPEND
  20: S_IFREG mode:0644 dev:162,1 ino:4842019 uid:0 gid:1 size:29387
      O_WRONLY|O_APPEND
  21: S_IFREG mode:0644 dev:162,1 ino:4842021 uid:0 gid:102 size:0
      O_WRONLY
      advisory write lock set by system 0x7FFF process 2135

Inode 4842021 is:

  4842021 -rw-r--r--   1 root     devel          0 Feb  3 15:34 logs/.nfsCAC


Something to do with fcntl locking over NFS?  I have other servers running
under this same architecture without problems, and I am sure there are
boatloads of ISP's running Apache off of a NetApp.

-Rasmus

Re: Maybe it's me - more locking problems

Posted by ra...@bellglobal.com.
> Don't tempt fate.  "LockFile /tmp/accept.lock" or some other local disk.
> 
> NFS locking isn't even though it may work sometimes.

Aww, sh*t...  Why didn't I think of that right away.  

Amazing how this has been working for months and suddenly broke now.

Thanks.  It cleared the problem up right away.

-Rasmus

Re: Maybe it's me - more locking problems

Posted by Marc Slemko <ma...@worldgate.com>.
On Tue, 3 Feb 1998 rasmus@bellglobal.com wrote:

> The Gods must be against me.  Apparently I don't know how to run an Apache
> server at all.  Now I have a 1.2.5 server which will serve up one 
> request after a restart and then get stuck in a lock.
> 
> The only non-standard thing compiled in right now is mod_jserv, but the
> symptoms are the same without that module.
> 
> The other non-standard thing, I guess, is that Apache and all of its files
> are on an NFS-mounted Netapp filesystem.

Don't tempt fate.  "LockFile /tmp/accept.lock" or some other local disk.

NFS locking isn't even though it may work sometimes.