You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Dale Ghent <da...@elemental.org> on 2000/02/01 14:37:55 UTC

2.0/dexter locking wierdness?

I'm playing the 2.0 snapshots on a server (Sun Ultra2300, Solaris
7 with kernel patch 106541-09) here, and it does something rather nasty
when a whole slew of requests are thrown at it (using 'ab -c 1000 -u
1000')

What happens is, of the 5 non-root apache processes running, one or two
will end up sucking up all of the CPU. I did a truss on these particular
processes, and they are stuck in some kind of write() loop, where
any write() call to a network socket always errors out with EPIPE. To try
to see what led up to this condition, I truss'd one of the apache
processes from start to finish. From looking at the resulting output, 
the only abnormal thing that happens right before the process enters into
the runaway write() state is that fcntl() fails right after accept()ing a
connection with EDEADLK. From that point on, all write()s for all threads
in that apache process then loop and fail constantly with EPIPE.

I saved the truss output to a file, and it is available at:

http://elemental.org/~daleg/httpd.out

The fun begins at line 4409. For those not familiar with truss, the
"/<number>:" that prepends each line is the thread number in the process.

This state is 100% reproduceable. I'm going to try the pthread mutex
locking model next and see if the same thing happens there.

FWIW,
/dale

Dale Ghent
----------
"An event e's being F realizes e's being G just in case (i) e is F,
(ii) e is G, (iii) for all e it is (physically) necessary that if e
is F then e is G, and (iv) e's being F explains e's being G. "
		- Basis of Multirealizability



Re: 2.0/dexter locking wierdness?

Posted by Manoj Kasichainula <ma...@io.com>.
Note: I'm not very involved with code these days; I should be diving
back into the code in a week or two. I'll be very happy if someone
else fixes this stuff first, though (hint hint).

On Tue, Feb 01, 2000 at 05:37:55AM -0800, Dale Ghent wrote:
> 
> From looking at the resulting output, 
> the only abnormal thing that happens right before the process enters into
> the runaway write() state is that fcntl() fails right after accept()ing a
> connection with EDEADLK.

That's just weird. I can only think of a couple of cases where we'd
even have multiple locks, and none where we could possibly have a
deadlock condition. 

> From that point on, all write()s for all threads
> in that apache process then loop and fail constantly with EPIPE.

Hmmm. Every socket that gets opened is reclosed somehow? *shrug*

IIRC, Solaris 2.x was supposed to support cross-process pthread
locking, so I'd imagine Solaris 7 does by now. Are you familiar enough
with autoconf to try changing the default cross-process lock in APR to
a shared pthread mutex?

Note: I think we're eventually going to have to replace or supplement
any dynamic lock checks we have with some of the data we've collected from
1.3 on what actually works and what doesn't.

> I saved the truss output to a file, and it is available at:
> 
> http://elemental.org/~daleg/httpd.out

Hmmmm, looks like Apache is completely ignoring return values from the
write() calls. The /nn: at the beginning of the line is a thread
number, right?

Whoa! Why is APR's ap_send call *always* returning APR_SUCCESS? This
could explain why Apache isn't reacting to the EPIPE.