You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ryan Bloom <rb...@raleigh.ibm.com> on 1999/04/26 22:22:51 UTC

fix for hybrid server problems.

I am prefacing this with "This is a twisted solution, but it works VERY
reliably."  This is a solution that comes from a couple of different
people within IBM, and I am summarizing it here.

We have a number of problems in the hybrid server that all have to do with
shutting down the server.

1) Can't break out of fcntl without a signal.
2) can't break out of select easily.
3) We could use pipes to break out of select, but as far as I can tell,
you can't use select with pipes on Windows NT(95|98?????)
4) We have threads serving requests that we can't kill in non-graceful
shutdown cases.
5) If we are handling a graceful shutdown, we ignore non-graceful shutdown
requests until we are done.  (Manoj says he knows how to solve this one.)
6) Inter- and Intr- process locking seems to be interfering with each
other.

The solution:
All threads accept SIGWINCH(graceful) and SIGHUP(non-graceful).  When a
thread gets either signal, it sets it's spot in the scoreboard, masks that
signal to SIG_IGN, checks to see if there are any more active threads, and
if so, re-sends the signal to it's own process.

Basically, if we have N threads in each process, we send the signal N
times to make sure that all of our threads are dying off properly.

The problems with this solution:
1) We end up relying on signals in the children.  We didn't want to do
this.
2) We have to put back in all of the block and unblock alarms code.
This would be more annoying than anything else.

The plusses to doing this:
1) If we abstract out the signal call in APR, this could be VERY portable.
2) It is garaunteed to work, everytime.
3) It mimics what 1.3 does now, so we know it is a viable solution.
4) It allows us to support non-threaded OS's easier, because all OS's,
whether they have threading or not have their children handling shutdown
the same way.
5) It also allows a -T option (single threaded, for debuggin
purposes)(same as #4)
6) It solves all of our current problems with shutdown.

Okay, I am ready for the flames that this solution should bring my way.
Please, be gentle.

Ryan

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.

Re: fix for hybrid server problems.

Posted by Manoj Kasichainula <ma...@io.com>.

On Mon, Apr 26, 1999 at 04:22:51PM -0400, Ryan Bloom wrote:
> All threads accept SIGWINCH(graceful) and SIGHUP(non-graceful).  When a
> thread gets either signal, it sets it's spot in the scoreboard, masks that
> signal to SIG_IGN, checks to see if there are any more active threads, and
> if so, re-sends the signal to it's own process.

We found some problems with this approach, one being that we can't do
per-thread signal masking without a pthread call, and we can't make
pthreads calls from signal handlers (grumble).

However, we've heard (and tested on Linux) that signal handlers run
uninterrupted as far as other threads in a process are concerned. If
true, at least for server shutdown, we could simply put the
clean_child_exit() inside a signal handler. But, the pools use
mutexes, and you can't make mutex calls inside signal handlers. So,
this would only be possible if we could pass arguments in to the pool
routines not to use mutexes. This is ugly, though.

The fundamental problem is that shutdown has the semantics of killing
the server very quickly. Those semantics mean we have to kill threads
in the middle of requests. If we didn't have third-party modules, the
worker threads could regularly check a shutdown flag. But we do have
third-party modules, and they could put long delays between flag
checks.  So, it seems we *have* to use some form of asynchronous
cancellation: either just exit()ing while our threads are running, or
Unix signals, or pthread_cancel(). And Unix signals just cause no end
of grief when mixed with threads.

Personally, I'm starting to lean towards pthread_cancel in deferred
mode to kill our worker threads when we get SIGTERM. Asynchronous
cancellation is evil, but in the hybrid server right now, we're just
exit()ing while our other threads are running, so we already have an
effective form of asynch cancellation. Of course, the problem is that
we could run into cancellation-unsafe third-party libraries. I think
that a module written to interface the third-party library could block
cancellations if there that's a concern, but it's only half an answer.

Thoughts?

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Yes, I'm Linus, and I'm your God" -- Linus Torvalds, Linux Expo '98

Re: fix for hybrid server problems.

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.

We considered using a socket to break us out of select.  Does that have
security issues associated with it?  On UNIX, I can use Unix Domain
Sockets, do I have a similar thing on (Windows(NT|98|95) && OS/2)?  

There was also the idea that all interprocess communication could be done
using this pipe/socket between parent and child.  This means all of our
OS's would behave the same way for Interprocess Comm.  Using a socket or a
pipe breaks this ability, because then at least one thread has to be in
the select in order to receive a message from the parent.  In general a
BAD idea.

The socket also doesn't take care of the case where we get a message from
the parent process to shut-down immediately, we break out of select, and
start freeing pools, while we still have threads doing work.  

This has other concerns, not likely to happen, but it is possible.
(twenty threads out serving requests.  Each has it's own sub-pool 1-20.)

Thread 21 is in select, and it gets kicked out, and starts cleaning things
up using clean_child_exit.  We clean pool 1, and then context switch.
Thread 2 is activated, and it needs more mem, so it allocates more, oh
wait, there is space at the beginning of the pool.  It takes that space,
and writes some data from a secure web page to it.  Thread 1 is woken up,
and it reads memory from it's pool to send data out over the network.  In
a good (lucky) case, it just sends out the secure data that was written
into what it consider's it's memory area.  In the bad case, we seg-fault,
and we never do the child_exit phase of the modules.

This is why we considered doing a signal loop.  It basically tells all the
threads, either finish this request and stop or "STOP NOW!"  In either
case, we will know that all threads are going to go away and when they do,
we can call clean_child_exit safely.

Ryan

On Tue, 27 Apr 1999, Ben Laurie wrote:

> Ryan Bloom wrote:
> > 3) We could use pipes to break out of select, but as far as I can tell,
> > you can't use select with pipes on Windows NT(95|98?????)
> 
> So connect to yourself via a socket.
> 
> Cheers,
> 
> Ben.
> 
> --
> http://www.apache-ssl.org/ben.html
> 
> "My grandfather once told me that there are two kinds of people: those
> who work and those who take the credit. He told me to try to be in the
> first group; there was less competition there."
>      - Indira Gandhi
> 

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.

Re: fix for hybrid server problems.

Posted by Ben Laurie <be...@algroup.co.uk>.

Ryan Bloom wrote:
> 3) We could use pipes to break out of select, but as far as I can tell,
> you can't use select with pipes on Windows NT(95|98?????)

So connect to yourself via a socket.

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html

"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Re: fix for hybrid server problems.

Posted by Greg Stein <gs...@lyra.org>.

Greg Stein wrote:
>...
> > > If we solve the above problem (and log rotation), this sounds
> > > reasonable.  Do we mind that there will be no way to get a mod_status
> > > display of our gracefully dying children after a graceful shutdown?
> >
> > Didn't think about that... hmm.
> 
> I would think that the modules wouldn't receive the graceful shutdown
> until they were done with their activity. In that case, the shutdown
> shouldn't take very long. It seems that you'd have a relatively narrow
> window where mod_status wouldn't work for the server.
> 
> (I mean really... you *are* telling the thing to shutdown/restart... why
> shouldn't it be allowed to punt your request for a while in there?  Did
> I miss something? This seems a pretty easy point here)

To answer my own question (I think :-) :

When the event thread gets the shutdown request, it will stop handling
requests -- including the /server-status request. This will occur until
all activity stops, which could be a while if an active request is
taking a while to complete.

IMO, tough. You told it to shutdown... it effectively *is* shutdown, so
how can you expect to get a status report? :-)

An interesting question is "what is the best way to quickly stop/restart
a highly loaded web server, which may also have long-term requests?"
Refusing connections (effectively) may not be an option. Is this where
the hard-kill comes in? :-)
[ this also argues for a more dynamic configuration system, since config
changes are what graceful restarts are usually used for ]

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: fix for hybrid server problems.

Posted by Greg Stein <gs...@lyra.org>.

Dean Gaudet wrote:
> On Tue, 4 May 1999, Manoj Kasichainula wrote:
> > Can we rely on close() or shutdown() getting rid of our listening
> > sockets while we're selecting on them?
> 
> The event thread is where we learn that we're supposed to do a graceful
> shutdown... so it's trivial to make sure we don't select on them.
> Unfortunately we may need to either spawn a new thread to do call the
> graceful shutdown methods, or we'll have to require the graceful shutdown
> methods to be non-blocking.  Hmm.

Nah. Just have a third item type in the request queue: "run graceful
shutdown methods". When the response comes back, the event thread can
die off.

Right after the event thread throws that request into the queue, it can
also place N "thread shutdown" requests into the queue. Each thread pops
that off the queue, cleans itself up, decrements some semaphore, and
terminates itself. (the event thread would block on this semaphore,
waiting for all child threads to die before really dying)

Maybe the ordering and details in there aren't quite right, but it seems
like a start.

(for example, you may need to wait for all activity to stop before
placing the graceful-shutdown request into the queue)

> > > graceful restart and graceful shutdown are the two suggested forms
> > > of restarting and shutting down the server.  They're the safe forms.
> >
> > If we solve the above problem (and log rotation), this sounds
> > reasonable.  Do we mind that there will be no way to get a mod_status
> > display of our gracefully dying children after a graceful shutdown?
> 
> Didn't think about that... hmm.

I would think that the modules wouldn't receive the graceful shutdown
until they were done with their activity. In that case, the shutdown
shouldn't take very long. It seems that you'd have a relatively narrow
window where mod_status wouldn't work for the server.

(I mean really... you *are* telling the thing to shutdown/restart... why
shouldn't it be allowed to punt your request for a while in there?  Did
I miss something? This seems a pretty easy point here)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 4 May 1999, Rodent of Unusual Size wrote:

> Dean Gaudet wrote:
> > 
> > On Tue, 4 May 1999, Manoj Kasichainula wrote:
> > 
> > >              Do we mind that there will be no way to get a mod_status
> > > display of our gracefully dying children after a graceful shutdown?
> > 
> > Didn't think about that... hmm.
> 
> Surely they'll still be displayable from other processes that
> have completed the restart?  Just not from any that are still
> in the throes.

I think there are two issues:

- a process won't exit (and free up its 64 scoreboard slots) until it is
  done serving all of its requests... 64 slots is a lot of memory

- we have no way to note the static fd/mmap responses that are being taken
  care of by the event thread

Shared memory is expensive on some systems -- it's non-swappable.  The
event thread will give us the possibility of handling thousands upon
thousands of long haul clients... which makes it really hard to statically
allocate enough memory to record all the requests. 

I'm thinking that a better solution would be to record only processes in
the shared memory scoreboard.  (There's probably a few stats we can record
on a per-process level.) 

Then within each process, we dynamically allocate scoreboard information
for each connection and store it in a linked list.  Plus we provide a
mechanism for processes to fetch this list from other processes. 

There's a few ways to do this... I haven't thought hard about it yet. 

Finally, we hide all of this under a simple API:  ap_open_scoreboard(),
ap_read_scoreboard(), ap_close_scoreboard()... which works somewhat like
opendir/readdir.  We need to hide it this way -- because the scoreboard
will be different on NT, for example, where multiple processes are
impossible (can't do accept() on same socket in multiple processes).

Dean

Re: fix for hybrid server problems.

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Dean Gaudet wrote:
> 
> On Tue, 4 May 1999, Manoj Kasichainula wrote:
> 
> >              Do we mind that there will be no way to get a mod_status
> > display of our gracefully dying children after a graceful shutdown?
> 
> Didn't think about that... hmm.

Surely they'll still be displayable from other processes that
have completed the restart?  Just not from any that are still
in the throes.
-- 
#ken    P-)}

Ken Coar                    <http://Web.Golux.Com/coar/>
Apache Software Foundation  <http://www.apache.org/>
"Apache Server for Dummies" <http://Web.Golux.Com/coar/ASFD/>

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.

On Tue, 4 May 1999, Manoj Kasichainula wrote:

> Can we rely on close() or shutdown() getting rid of our listening
> sockets while we're selecting on them?

The event thread is where we learn that we're supposed to do a graceful
shutdown... so it's trivial to make sure we don't select on them. 
Unfortunately we may need to either spawn a new thread to do call the
graceful shutdown methods, or we'll have to require the graceful shutdown
methods to be non-blocking.  Hmm. 

> > graceful restart and graceful shutdown are the two suggested forms
> > of restarting and shutting down the server.  They're the safe forms.
> 
> If we solve the above problem (and log rotation), this sounds
> reasonable.  Do we mind that there will be no way to get a mod_status
> display of our gracefully dying children after a graceful shutdown?

Didn't think about that... hmm. 

> 
> > logging:  yeah, there is difficulty with log rotation if graceful
> > restart is the only restart available -- there's no way for an
> > external program to know that all children have finished writing
> > the rotated logs.  There's a few possibilities for solving this...
> 
> Do the words "reliable", "piped", and "logs" get used, in that order?

That's one way yeah.  I think there are others.  Nothing makes me entirely
happy though -- hoping someone will come up with a killer solution.  The
one thing which irks me about piped logs is that many systems have small
pipe buffers.  What we really want is a deep buffer with priority
inversion -- force the log process to a lower priority, so it's only
awakened when the buffer is full.  Otherwise you incur extra context
switches.

I think you can tune the buffer size for socketpair()s on many systems... 
that's part of the solution. 

Maybe there's a slick solution using shared memory. 

If you want to log directly to files, you can use a lock to figure out if
any processes are currently writing to a log.  Open the log and put a
shared lock on it for each httpd child (locks are advisory on most
systems, use a separate lock file for systems with mandatory locks).  Then
the log rotator rotates the log, sends the graceful restart signal, and
tries to acquire an exclusive lock.  It won't acquire the lock until the
last child exits.

piped logs is probably a good method to start with. 

> > You'll be surprised to hear me bring up an acceptor thread again... but
> > I have a reason for returning its existance:  to service static requests.
> 
> It sounds cooler and cooler the more I think about it (which isn't
> much yet), but I really was hoping to get rid of that intervening
> request queue. :( Well, at least with only one thread doing the
> pushing on that queue, we can get rid of a pair of mutex calls.

I'm pretty sure we won't have much contention on that mutex... assume we
have one process per processor.  The mutex is local to a processor, and
the operations are short... 

The one which bothers me a bit more is the response queue which is a
pipe() -- that's two extra syscalls.  But they only happen on responses
longer than SO_SNDBUF bytes.  And I think it'll be worth it -- we can
handle far more long haul clients this way.

Dean

Re: fix for hybrid server problems.

Posted by Manoj Kasichainula <ma...@io.com>.

I'm reordering Dean's ideas, from easiest to hardest. :)

On Mon, May 03, 1999 at 12:05:22PM -0700, Dean Gaudet wrote:
> MaxRequestsPerChild (perthread, whatever) are best guesses.  We
> don't guarantee that we'll hit them dead on.  When a worker thread
> notices MaxRequestsPerChild has been hit, it sends an event to the
> event thread, the event thread initiates a graceful shutdown of
> the child process.

This is actually all true right now, just done in a different way.
MaxRequestsPerChild handling isn't mutexed, so we can miss a few
decrements of requests_this_child.  And, once we've hit
MaxRequestsPerChild, we raise SIGWINCH, and try to convince our
threads to die. It would be nice to replace this with a pipe, though.

> Add a new API phase -- "graceful shutdown".  This is invoked when
> the parent asks the process to shutdown -- remember to a httpd child
> there is no distinction between graceful shutdown or graceful restart.
> When the graceful shutdown occurs, modules (and the core) should close
> up any resource which might prevent another instance of the webserver
> from starting... such as listening sockets.

Can we rely on close() or shutdown() getting rid of our listening
sockets while we're selecting on them?

> graceful restart and graceful shutdown are the two suggested forms
> of restarting and shutting down the server.  They're the safe forms.

If we solve the above problem (and log rotation), this sounds
reasonable.  Do we mind that there will be no way to get a mod_status
display of our gracefully dying children after a graceful shutdown?

> logging:  yeah, there is difficulty with log rotation if graceful
> restart is the only restart available -- there's no way for an
> external program to know that all children have finished writing
> the rotated logs.  There's a few possibilities for solving this...

Do the words "reliable", "piped", and "logs" get used, in that order?

> You'll be surprised to hear me bring up an acceptor thread again... but
> I have a reason for returning its existance:  to service static requests.

It sounds cooler and cooler the more I think about it (which isn't
much yet), but I really was hoping to get rid of that intervening
request queue. :( Well, at least with only one thread doing the
pushing on that queue, we can get rid of a pair of mutex calls.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Would you die for The One?"
"I wouldn't get pizza for the one. That ain't my job." - J.M. Straczynski

Re: fix for hybrid server problems.

Posted by Manoj Kasichainula <ma...@io.com>.

On Sun, May 09, 1999 at 01:54:58PM -0700, Dean Gaudet wrote:
> I care less about static benchmark tests on local networks with no latency
> than I do about real life web sites with loads of long haul, slow clients,
> downloading large files consuming an expensive resource:  a thread slot
> and stack.

Ahhh. /me thwaps forehead. Makes sense. I am now fully convinced that
this is extremely cool.

Anyway, unless someone beats me to it (and please do), I'd like to
help implement something like this as soon as the server-wide shutdown
notification pipe is done.  Unfortunately, I'll have little time this
week for any such things.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"'Why do you blow on people?' I don't know." -- Benny Hinn

Re: fix for hybrid server problems.

Posted by Dan Kegel <da...@alumni.caltech.edu>.

Vivek Sadananda Pai wrote:
> I'll be presenting those measurements as part of a paper that'll
> appear at the next Usenix conference. If you've got postscript,
> the paper is available at http://www.cs.rice.edu/~vivek/flash99/

Interesting.  I've quoted you and linked to your paper from
http://www.kegel.com/c10k.html
- Dan

Re: fix for hybrid server problems.

Posted by Vivek Sadananda Pai <vi...@cs.rice.edu>.

Manoj Kasichainula <ma...@io.com> wrote:
> 
> My understanding is that a select()-based server is fast because you
> don't have to deal with many context switches, right?

I've compared the raw performance of a select-based server with a
multiple-process server on both FreeBSD and Solaris/x86. On
microbenchmarks, there's only a marginal difference in performance
stemming from the software architecture. The big performance win for
select-based servers stems from doing application-level caching. While
multiple-process servers can do it at a higher cost, it's harder to
get the same benefits on real workloads (vs microbenchmarks).

I'll be presenting those measurements as part of a paper that'll
appear at the next Usenix conference. If you've got postscript,
the paper is available at http://www.cs.rice.edu/~vivek/flash99/

-Vivek

P.S. I'll be out of e-mail contact for about a week or so, but will
read the list again when I get back

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.

On Mon, 17 May 1999, Dan Kegel wrote:

> No need for a second process on those OS's that support sendfile(),
> I think, 'cause then the disk I/O is done in the background for
> you by the kernel.
> (Cool, another reason to try sendfile()!)

sendfile() blocks as well.  At least on linux it does, and I'd be
surprised if it didn't block elsewhere.  There is no "completion" call for
sendfile() -- you need a completion call in order to do things
asynchronously. 

Or you can peek at the linux kernel source, mm/filemap.c, search for
do_generic_file_read, notice the wait_on_page() call. 

Dean

Re: fix for hybrid server problems.

Posted by Dan Kegel <da...@alumni.caltech.edu>.

Vivek Sadananda Pai wrote:
> On Sun, 9 May 1999, Dean Gaudet <dg...@arctic.org> wrote:
> > So far my largest concern, which is the same as my concern with entirely
> > select-based servers, is disk i/o.  This all works wonderfully if you
> > rarely have to page from disk.  But for servers with large working sets,
> > aggregating like this hurts because you have only a single i/o request
> > outstanding at a time... with multiple processes we alleviate some of this
> > problem... there are probably other options...
> 
> I found two problems with purely select-based servers with regard to
> having only a disk single I/O available. The first, of course, is that
> you can't parallelize disk accesses (scheduling or multiple disks),
> but more importantly, while the process is blocked, no other
> user-level processing occurs.

No need for a second process on those OS's that support sendfile(),
I think, 'cause then the disk I/O is done in the background for
you by the kernel.
(Cool, another reason to try sendfile()!)
- Dan

(p.s. Sorry if this was a repost.)

Re: fix for hybrid server problems.

Posted by Vivek Sadananda Pai <vi...@cs.rice.edu>.

On Mon, 17 May 1999, Michael Anderson <mk...@gto1.telmex.net.mx> wrote:
> > If Apache goes partially select-based, the Flash model might
> > be useful to address the disk problems. Details are in the
> > upcoming Usenix paper, available at
> > http://www.cs.rice.edu/~vivek/flash99/
> 
> Is Flash and IO-Lite source publically available?

I plan to release them both at some point, under something resembling
a "free for non-commercial use" license. However, this isn't going to
happen anytime soon, since I'm tied up for the next month, and then I
have to talk to some lawyers about IP issues.

Also, someone (I think Dan) mentioned sendfile, and I received a
message about using aio. Both approaches can make a select-based
server nonblocking[*] for some things, but it's not universal. For
example, metadata accesses (like open() or stat()) don't get made
asynchronous this way. It's also not fully portable.

In any case, it's not terribly important, because going from a
select-based server to something like what Flash does is relatively
straightforward. It's basically just adding extra states to the
select-driven state machine.

-Vivek

* assuming the OS does things the "right" way

Re: fix for hybrid server problems.

Posted by Michael Anderson <mk...@gto1.telmex.net.mx>.

Vivek Sadananda Pai wrote:

> If Apache goes partially select-based, the Flash model might
> be useful to address the disk problems. Details are in the
> upcoming Usenix paper, available at
> http://www.cs.rice.edu/~vivek/flash99/

Is Flash and IO-Lite source publically available?

Regards,
-- 
Mike Anderson
mka@ringrosa.com
+52 473 25789 voice
+52 473 24837 fax
Guanajuato, GTO, Mexico

Re: fix for hybrid server problems.

Posted by Vivek Sadananda Pai <vi...@cs.rice.edu>.

On Sun, 9 May 1999, Dean Gaudet <dg...@arctic.org> wrote:
> 
> So far my largest concern, which is the same as my concern with entirely
> select-based servers, is disk i/o.  This all works wonderfully if you
> rarely have to page from disk.  But for servers with large working sets,
> aggregating like this hurts because you have only a single i/o request
> outstanding at a time... with multiple processes we alleviate some of this
> problem... there are probably other options -- but I think this is
> something we can deal with when we get there. 

I found two problems with purely select-based servers with regard to
having only a disk single I/O available. The first, of course, is that
you can't parallelize disk accesses (scheduling or multiple disks),
but more importantly, while the process is blocked, no other
user-level processing occurs. 

It turns out to be relatively simple to adapt a select-based server to
use other processes for all disk data and metadata accesses, and
that's one of the goals of the Flash server - measurements suggest
that its performance across cache-friendly or even disk-bound
workloads is competitive with any other model. 

If Apache goes partially select-based, the Flash model might be useful
to address the disk problems. Details are in the upcoming Usenix paper,
available at http://www.cs.rice.edu/~vivek/flash99/

-Vivek

Re: fix for hybrid server problems.

Posted by un...@riverstyx.net.

That's pretty much all I've seen under Apache.  I've seen far more with
thttpd.  I'm guessing that a major limitation was that scheduling problem
in the Linux kernel, and I hear that was fixed.  I haven't had a chance to
play with it yet though.

---
tani hosokawa
river styx internet


On Sun, 9 May 1999, Vincent Janelle wrote:

> unknown@riverstyx.net wrote:
> > 
> > Which is what I use for load testing Apache :-)  256 clients all running
> > at 115.2kbps is still a ton of traffic.  25 megabit, roughly.  With each
> > of those opening multiple connections (which is standard for most
> > browsers) you'll end up with probably 1200 simultaneous connections, which
> > is going to beat the hell out of both Linux and Apache in their current
> > states.
> Only 1200 connections?  I've seen more with a few other servers.. (I
> think the highest was 6400 with roxen on linux).
> 
> What is the main limitation of apache with its forking model?  Mainly
> the operating system, or things like memory management?  
> 
> > 
> > ---
> > tani hosokawa
> > river styx internet
> > 
> > On Sun, 9 May 1999, Marc Slemko wrote:
> > 
> > > On Sun, 9 May 1999 unknown@riverstyx.net wrote:
> > >
> > > > I was thinking, in order for you guys to better benchmark Apache against
> > > > real-world types of stresses (since you obviously don't have high volume
> > > > real world servers that you can just play with on a whim) if you could get
> > > > a network set up with, say, 8 32-port DigiBoards on 8 low-end Pentium
> > > > routers, and putting all the test servers from there, running SLIP from
> > > > the serial ports, you could effectively simulate the common long haul
> > > > clients.  256 bandwidth constrained clients would be much more like what a
> > > > real webserver deals with.
> > >
> > > You would need a pretty tiny or inefficient server to saturate it
> > > with 256 low speed connections.  Simulating real world situations isn't
> > > easy.
> > >
> > > I still think the real way to do that sort of testing is just to get
> > > someone to done bandwidth, and setup a free porn site.  I am actually at
> > > least somewhat serious.
> > >
> 
> -- 
> ------------
> If life is merely a joke, the question still remains: for whose
> amusement?
> --http://random.gimp.org --mailto:random@gimp.org --UIN 23939474
>

Re: fix for hybrid server problems.

Posted by Vincent Janelle <vj...@home.com>.

unknown@riverstyx.net wrote:
> 
> Which is what I use for load testing Apache :-)  256 clients all running
> at 115.2kbps is still a ton of traffic.  25 megabit, roughly.  With each
> of those opening multiple connections (which is standard for most
> browsers) you'll end up with probably 1200 simultaneous connections, which
> is going to beat the hell out of both Linux and Apache in their current
> states.
Only 1200 connections?  I've seen more with a few other servers.. (I
think the highest was 6400 with roxen on linux).

What is the main limitation of apache with its forking model?  Mainly
the operating system, or things like memory management?  

> 
> ---
> tani hosokawa
> river styx internet
> 
> On Sun, 9 May 1999, Marc Slemko wrote:
> 
> > On Sun, 9 May 1999 unknown@riverstyx.net wrote:
> >
> > > I was thinking, in order for you guys to better benchmark Apache against
> > > real-world types of stresses (since you obviously don't have high volume
> > > real world servers that you can just play with on a whim) if you could get
> > > a network set up with, say, 8 32-port DigiBoards on 8 low-end Pentium
> > > routers, and putting all the test servers from there, running SLIP from
> > > the serial ports, you could effectively simulate the common long haul
> > > clients.  256 bandwidth constrained clients would be much more like what a
> > > real webserver deals with.
> >
> > You would need a pretty tiny or inefficient server to saturate it
> > with 256 low speed connections.  Simulating real world situations isn't
> > easy.
> >
> > I still think the real way to do that sort of testing is just to get
> > someone to done bandwidth, and setup a free porn site.  I am actually at
> > least somewhat serious.
> >

-- 
------------
If life is merely a joke, the question still remains: for whose
amusement?
--http://random.gimp.org --mailto:random@gimp.org --UIN 23939474

Re: fix for hybrid server problems.

Posted by un...@riverstyx.net.

Which is what I use for load testing Apache :-)  256 clients all running
at 115.2kbps is still a ton of traffic.  25 megabit, roughly.  With each
of those opening multiple connections (which is standard for most
browsers) you'll end up with probably 1200 simultaneous connections, which
is going to beat the hell out of both Linux and Apache in their current
states.

---
tani hosokawa
river styx internet


On Sun, 9 May 1999, Marc Slemko wrote:

> On Sun, 9 May 1999 unknown@riverstyx.net wrote:
> 
> > I was thinking, in order for you guys to better benchmark Apache against
> > real-world types of stresses (since you obviously don't have high volume
> > real world servers that you can just play with on a whim) if you could get
> > a network set up with, say, 8 32-port DigiBoards on 8 low-end Pentium
> > routers, and putting all the test servers from there, running SLIP from
> > the serial ports, you could effectively simulate the common long haul
> > clients.  256 bandwidth constrained clients would be much more like what a
> > real webserver deals with.
> 
> You would need a pretty tiny or inefficient server to saturate it
> with 256 low speed connections.  Simulating real world situations isn't
> easy.
> 
> I still think the real way to do that sort of testing is just to get
> someone to done bandwidth, and setup a free porn site.  I am actually at
> least somewhat serious.
>

Re: fix for hybrid server problems.

Posted by Marc Slemko <ma...@znep.com>.

On Sun, 9 May 1999 unknown@riverstyx.net wrote:

> I was thinking, in order for you guys to better benchmark Apache against
> real-world types of stresses (since you obviously don't have high volume
> real world servers that you can just play with on a whim) if you could get
> a network set up with, say, 8 32-port DigiBoards on 8 low-end Pentium
> routers, and putting all the test servers from there, running SLIP from
> the serial ports, you could effectively simulate the common long haul
> clients.  256 bandwidth constrained clients would be much more like what a
> real webserver deals with.

You would need a pretty tiny or inefficient server to saturate it
with 256 low speed connections.  Simulating real world situations isn't
easy.

I still think the real way to do that sort of testing is just to get
someone to done bandwidth, and setup a free porn site.  I am actually at
least somewhat serious.

Re: fix for hybrid server problems.

Posted by un...@riverstyx.net.

I was thinking, in order for you guys to better benchmark Apache against
real-world types of stresses (since you obviously don't have high volume
real world servers that you can just play with on a whim) if you could get
a network set up with, say, 8 32-port DigiBoards on 8 low-end Pentium
routers, and putting all the test servers from there, running SLIP from
the serial ports, you could effectively simulate the common long haul
clients.  256 bandwidth constrained clients would be much more like what a
real webserver deals with.

---
tani hosokawa
river styx internet


On Sun, 9 May 1999, Dean Gaudet wrote:

> On Sun, 9 May 1999, Manoj Kasichainula wrote:
> 
> > My understanding is that a select()-based server is fast because you
>                                                    ^^
> > don't have to deal with many context switches, right?
> 
> s/is/can be/
> 
> > In this case though, we have at least 1 forced context switch for
> > every connection (to pass a new connection to a worker thread). And
> > for the slightly big files, we have to jump back to the event thread
> > and back again to a worker thread for logging.
> > 
> > With long files, you avoid the context switching between threads as
> > they write out the data, but couldn't this be largely eliminated with
> > mmap+write or sendfile() anyway?
> 
> I care less about static benchmark tests on local networks with no latency
> than I do about real life web sites with loads of long haul, slow clients,
> downloading large files consuming an expensive resource:  a thread slot
> and stack.
> 
> It's one of those cases where my opinion is that the wins are in
> real-world useability, and correctness (so far it looks like the best
> solution for the graceful stuff) and the cost might be a modicum of
> performance loss on static localnet benchmarks. 
> 
> Also, this helps keep-alive connections.  We could plop connections back
> up to the event-thread to wait for any more input... rather than consuming
> a thread for 15s...
> 
> We can make the cutoff point different if you're concerned over the extra
> context switches... but really -- for responses over SO_SNDBUF, the worker
> thread has to block at least once.  It may as well pass the work to the
> event-thread to be aggregated with other similar work before blocking. 
> 
> What would be extra cool would be LIFO semantics on threads trying to
> dequeue from the request queue -- if we could service multiple requests in
> one time slice on one thread that would be way nice.
> 
> So far my largest concern, which is the same as my concern with entirely
> select-based servers, is disk i/o.  This all works wonderfully if you
> rarely have to page from disk.  But for servers with large working sets,
> aggregating like this hurts because you have only a single i/o request
> outstanding at a time... with multiple processes we alleviate some of this
> problem... there are probably other options -- but I think this is
> something we can deal with when we get there. 
> 
> Dean
> 
>

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.

On Sun, 9 May 1999, Manoj Kasichainula wrote:

> My understanding is that a select()-based server is fast because you
                                                   ^^
> don't have to deal with many context switches, right?

s/is/can be/

> In this case though, we have at least 1 forced context switch for
> every connection (to pass a new connection to a worker thread). And
> for the slightly big files, we have to jump back to the event thread
> and back again to a worker thread for logging.
> 
> With long files, you avoid the context switching between threads as
> they write out the data, but couldn't this be largely eliminated with
> mmap+write or sendfile() anyway?

I care less about static benchmark tests on local networks with no latency
than I do about real life web sites with loads of long haul, slow clients,
downloading large files consuming an expensive resource:  a thread slot
and stack.

It's one of those cases where my opinion is that the wins are in
real-world useability, and correctness (so far it looks like the best
solution for the graceful stuff) and the cost might be a modicum of
performance loss on static localnet benchmarks. 

Also, this helps keep-alive connections.  We could plop connections back
up to the event-thread to wait for any more input... rather than consuming
a thread for 15s...

We can make the cutoff point different if you're concerned over the extra
context switches... but really -- for responses over SO_SNDBUF, the worker
thread has to block at least once.  It may as well pass the work to the
event-thread to be aggregated with other similar work before blocking. 

What would be extra cool would be LIFO semantics on threads trying to
dequeue from the request queue -- if we could service multiple requests in
one time slice on one thread that would be way nice.

So far my largest concern, which is the same as my concern with entirely
select-based servers, is disk i/o.  This all works wonderfully if you
rarely have to page from disk.  But for servers with large working sets,
aggregating like this hurts because you have only a single i/o request
outstanding at a time... with multiple processes we alleviate some of this
problem... there are probably other options -- but I think this is
something we can deal with when we get there. 

Dean

Re: fix for hybrid server problems.

Posted by Manoj Kasichainula <ma...@io.com>.

On Mon, May 03, 1999 at 12:05:22PM -0700, Dean Gaudet wrote:
> The event thread communicates with worker threads through two queues --
> the request queue, and the response queue.
> 
> Implement the request queue using whatever pthread synchronization
> method seems appropriate.  The request queue can contain two different
> data items -- a new connection, or a request_req of a finished static
> response (with extra info needed for logging).
> 
> The response queue contains request_req's and the assorted fd/mmap info
> needed to send the static response.  The response queue is implemented
> using a pipe so that the event thread can use select() to find out when
> it has events.  We can actually write "void *"s onto the pipe.

My understanding is that a select()-based server is fast because you
don't have to deal with many context switches, right?

In this case though, we have at least 1 forced context switch for
every connection (to pass a new connection to a worker thread). And
for the slightly big files, we have to jump back to the event thread
and back again to a worker thread for logging.

With long files, you avoid the context switching between threads as
they write out the data, but couldn't this be largely eliminated with
mmap+write or sendfile() anyway?

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"When you say `I wrote a program that crashed Windows', people just stare at
you blankly and say `Hey, I got those with the system, *for free*'"
  -- Linus Torvalds

Re: fix for hybrid server problems.

Posted by Dan Kegel <da...@alumni.caltech.edu>.

Tony Finch wrote:
> 
> "Brad Fitzpatrick" <br...@bradfitz.com> wrote:
> >> Can we also coax you into proposing using sendfile() on
> >> operating systems that support it?
> >
> >Curious ... Which systems does this include?
> 
> At least FreeBSD, Linux, HP-UX.

NT, too.
- Dan

Re: fix for hybrid server problems.

Posted by Tony Finch <do...@dotat.at>.

"Brad Fitzpatrick" <br...@bradfitz.com> wrote:
>> Can we also coax you into proposing using sendfile() on 
>> operating systems that support it?
>
>Curious ... Which systems does this include?

At least FreeBSD, Linux, HP-UX.

Tony.
-- 
f.a.n.finch   dot@dotat.at   fanf@demon.net
Arthur: "Oh, that sounds better, have you worked out the controls?"
Ford:   "No, we just stopped playing with them."

RE: fix for hybrid server problems.

Posted by Brad Fitzpatrick <br...@bradfitz.com>.

> Can we also coax you into proposing using sendfile() on 
> operating systems that support it?

Curious ... Which systems does this include?

> I'd like Apache to clean IIS's clock on static file benchmarks...

:)


- Brad

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.


On Tue, 4 May 1999, Dan Kegel wrote:

> Can we also coax you into
> proposing using sendfile() on operating systems that support it?

I think some versions of sendfile() interact poorly with non-blocking
sockets, so those we couldn't use it.  But otherwise yeah. 

Dean

Re: fix for hybrid server problems.

Posted by Dan Kegel <da...@alumni.caltech.edu>.

Dean Gaudet wrote:
> You'll be surprised to hear me bring up an acceptor thread again... but
> I have a reason for returning its existance:  to service static requests.
> Each process will have one "event thread" which runs select on:
> 
> - a socket/pipe connected to the parent
> - a socket/pipe used for the response queue
> - all listening sockets/pipes
> - all in-progress static responses (i.e. copy an fd or mmap out to the
>   client)

That should take care of the people (like me) who keep asking "Why can't
Apache be more like thttpd?".   Can we also coax you into
proposing using sendfile() on operating systems that support it?
I'd like Apache to clean IIS's clock on static file benchmarks...
- Dan

Re: fix for hybrid server problems.

Posted by Dean Gaudet <dg...@arctic.org>.

To be honest, I don't think we should preserve the current semantics of
HUP and USR1, MaxRequestsPerChild, .... Let's tie together a few ideas
and see how it fits.

You'll be surprised to hear me bring up an acceptor thread again... but
I have a reason for returning its existance:  to service static requests.
Each process will have one "event thread" which runs select on:

- a socket/pipe connected to the parent
- a socket/pipe used for the response queue
- all listening sockets/pipes
- all in-progress static responses (i.e. copy an fd or mmap out to the
  client)

Supporting all that should ammortize the value of the event thread.

The event thread communicates with worker threads through two queues --
the request queue, and the response queue.

Implement the request queue using whatever pthread synchronization
method seems appropriate.  The request queue can contain two different
data items -- a new connection, or a request_req of a finished static
response (with extra info needed for logging).

The response queue contains request_req's and the assorted fd/mmap info
needed to send the static response.  The response queue is implemented
using a pipe so that the event thread can use select() to find out when
it has events.  We can actually write "void *"s onto the pipe.

We only use the response queue for responses that send more than SO_SNDBUF
bytes of data.  No sense doing it for anything less, they won't have to
block, just write() directly and proceed to logging.

Add a new API phase -- "graceful shutdown".  This is invoked when
the parent asks the process to shutdown -- remember to a httpd child
there is no distinction between graceful shutdown or graceful restart.
When the graceful shutdown occurs, modules (and the core) should close
up any resource which might prevent another instance of the webserver
from starting... such as listening sockets.  The actual shutdown
won't occur until the last thread exits.  We can send events to the
event thread each time a thread exits... I think ... so the event
thread can take care of doing the final shutdown when the last thread
exits.

graceful restart and graceful shutdown are the two suggested forms
of restarting and shutting down the server.  They're the safe forms.
We can support a non-graceful shutdown... for this I don't care if we use
signals and destroy threads left and right, it's what the admin asked for.
(If you watch admins you'll notice many of them are lazy and use kill
-9 left and right, so this bad behaviour is pretty normal).

MaxRequestsPerChild (perthread, whatever) are best guesses.  We
don't guarantee that we'll hit them dead on.  When a worker thread
notices MaxRequestsPerChild has been hit, it sends an event to the
event thread, the event thread initiates a graceful shutdown of
the child process.

logging:  yeah, there is difficulty with log rotation if graceful
restart is the only restart available -- there's no way for an
external program to know that all children have finished writing
the rotated logs.  There's a few possibilities for solving this...

Dean