You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Michael Durket <du...@rlucier-home2.stanford.edu> on 2008/06/12 15:56:21 UTC

Re: Apache, Solaris, AcceptMutex and EDEADLK

(I sent this originally to Joe Orton who suggested I post it to this  
list instead):


    I've been recently debugging an issue with Solaris, Apache and  
EDEADLK. Turning
to Google, I ran across several posts, but found this fairly recent  
post:

    http://www.mail-archive.com/dev@apr.apache.org/msg19804.html

   "The default was changed to fcntl because of the potential for  
deadlocks
     in use of cross-process pthread mutexes:

           http://marc.info/?l=apr-dev&m=108720968023158&w=2

     are those issues not seen any more? Since that decision was due  
to a
     potential OS bug (robust mutexes which aren't robust) has it been
     confirmed with Sun that this fcntl/EDEADLK is definitely not an  
OS bug?"

    I don't know if a reply was ever received (I haven't found one yet  
in my Google
searching). I can confirm (at least in my case) from extensive DTrace  
debugging
of Apache 2.2.8 locking behavior under Solaris 10, that, no, this is  
not a Solaris
bug - it's properly detecting the classic deadlock case involving (at  
least) 2 locks
wherein process 1 holds lock A and wants lock B, and process 2 holds  
lock B and
wants lock A. I see this case occur in my DTrace output just before  
the EDEADLK
return.

    This always involves the Accept Mutex and one other lock, which is  
usually a global
mutex. It occurs because the Worker MPM is, of course, threaded and  
multi-process, so
it's quite possible for 2 threads in one of the Worker MPM processes  
to hold locks - one
holding the AcceptMutex, and the other wanting to lock say, the  
mod_rewrite RewriteLock. Then
if another Worker MPM process has 2 threads, one of which is holding  
the mod_rewrite RewriteLock
and a second thread in that same process wanting the AcceptMutex lock,  
EDEADLK will be returned
to one,  because Solaris is looking at the process level, not the  
thread level. If the locking were
treated as being at the thread level, there would be no deadlock.

     I've seen that, for some people, setting AcceptMutex pthread  
fixed a similar problem, but I
was concerned about your comment posted above. Have you heard whether  
or not the
cross-process pthread problems involving lock robustness problems have  
been solved?

     Sincerely yours,

     Michael Durket