You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/11/19 03:16:54 UTC
[Bug 3983] New: adopt Apache preforking algorithm
http://bugzilla.spamassassin.org/show_bug.cgi?id=3983
Summary: adopt Apache preforking algorithm
Product: Spamassassin
Version: 3.0.1
Platform: Other
OS/Version: other
Status: NEW
Severity: major
Priority: P5
Component: spamc/spamd
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
from discussions with a few people at ApacheCon, I think I have
an idea of what may have caused trouble swap-wise in SpamAssassin
use for a lot of people.
By default, SpamAssassin 3.0.0's spamd uses preforking, and starts
5 servers. So we have:
10 MB shared per process
20 MB unshared x 5
= 110 MB used by all processes
In addition, something that's become clear, is that linux (and possibly other
OSes) will schedule each of the preforked servers to service an incoming
accept() call, *in turn*. Even if some of those are swapped out.
Now, let's say you're running that on a server with 90 MB of free RAM. At
least one or two of those servers will have to be swapped out to fit them all
in. Once a sufficient number of requests arrive -- even under low load -- that
server will be re-swapped in, and another swapped out every 5 requests... in
other words, creating swap load.
What I'm proposing to do is to adopt the Apache preforking algorithm.
This will mean:
- instead of the requests being distributed across all servers equally,
most requests are delivered to the first idle server (sorted by PID).
So, in other words a small number of the running servers will handle
most of the requests under normal load. This will reduce swapping,
since the inactive servers stay inactive, and therefore swapped out
until they're needed.
I think this is the main fix that'll improve response time and cut
swapping.
- the number of servers running will no longer be simply == max-children;
instead, it'll scale from (1 .. max-children) based on how busy the
server is, using a min-idle-children and max-idle-children pair
of settings. in other words, under low load, server children
are killed off; under high load, new servers are forked in advance
to deal with more simultaneous requests.
This is implemented, and *seems* to work well, although it could do
with testing under variable loads.
- the master process serialises all accept() calls. The parent spamd
entirely controls when, and if, a child accepts a new connection, and
which child does so.
This is required to implement "lowest-PID idle child accepts", and --
surprisingly -- has very little effect on throughput in my testing;
in a simple speed test using about 100 messages, I get these figures:
3.0: 10.819 8.441 9.419 8.865 9.704 = 47.248/5 = 9.45
new: 9.851 9.277 9.759 = 28.887/3 = 9.63
that's 9.45 messages/sec with the 3.0 code, 9.63 m/s with the new
preforking algorithm. however, I think there's actually a small
slowdown in fact, compared to non-preforking, based on this:
rrn: 10.240 9.393 9.339 10.033 = 39.005/3 = 9.75
the methodology used to generate these, btw, are just: start spamd, and do
"for mail in [100-message corpus] ; do spamc < $mail ; done", in other words
send 1 message through, each time, in sequence.
note the *lower* variability in the m/s figures using the new code. Even
though my machine has 1Gb of RAM, the re-use of the same spamd child
processes seems to increase the predictability of scan times, even without
swapping!
- a bonus: we'll have a back-channel from the spamd children to the spamd
parent to send messages and report stats. this should be pretty
damn useful ;)
So, I'll upload the current statke of the patch -- I hacked it up over the
last few days and finished it, more or less, last night. I'd really appreciate
some people trying it out...
regarding target: 3.1.0 at least. But the way I did it is pretty nonintrusive
-- in my opinion it could even fit into 3.0.x ;)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.