You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/11/19 03:16:54 UTC

[Bug 3983] New: adopt Apache preforking algorithm

http://bugzilla.spamassassin.org/show_bug.cgi?id=3983

           Summary: adopt Apache preforking algorithm
           Product: Spamassassin
           Version: 3.0.1
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


from discussions with a few people at ApacheCon, I think I have
an idea of what may have caused trouble swap-wise in SpamAssassin
use for a lot of people.

By default, SpamAssassin 3.0.0's spamd uses preforking, and starts
5 servers.  So we have:

    10 MB shared per process
    20 MB unshared x 5
    = 110 MB used by all processes

In addition, something that's become clear, is that linux (and possibly other
OSes) will schedule each of the preforked servers to service an incoming
accept() call, *in turn*.  Even if some of those are swapped out.

Now, let's say you're running that on a server with 90 MB of free RAM.  At
least one or two of those servers will have to be swapped out to fit them all
in.  Once a sufficient number of requests arrive -- even under low load -- that
server will be re-swapped in, and another swapped out every 5 requests... in
other words, creating swap load.

What I'm proposing to do is to adopt the Apache preforking algorithm.
This will mean:

- instead of the requests being distributed across all servers equally,
  most requests are delivered to the first idle server (sorted by PID).
  So, in other words a small number of the running servers will handle
  most of the requests under normal load.  This will reduce swapping,
  since the inactive servers stay inactive, and therefore swapped out
  until they're needed.

  I think this is the main fix that'll improve response time and cut
  swapping.

- the number of servers running will no longer be simply == max-children;
  instead, it'll scale from (1 .. max-children) based on how busy the
  server is, using a min-idle-children and max-idle-children pair
  of settings.  in other words, under low load, server children
  are killed off; under high load, new servers are forked in advance
  to deal with more simultaneous requests.

  This is implemented, and *seems* to work well, although it could do
  with testing under variable loads.

- the master process serialises all accept() calls.  The parent spamd
  entirely controls when, and if, a child accepts a new connection, and
  which child does so.

  This is required to implement "lowest-PID idle child accepts", and --
  surprisingly -- has very little effect on throughput in my testing;
  in a simple speed test using about 100 messages, I get these figures:

  3.0: 10.819 8.441 9.419 8.865 9.704 = 47.248/5 =  9.45
  new: 9.851 9.277 9.759              = 28.887/3 =  9.63

  that's 9.45 messages/sec with the 3.0 code, 9.63 m/s with the new
  preforking algorithm.  however, I think there's actually a small
  slowdown in fact, compared to non-preforking, based on this:

  rrn: 10.240 9.393 9.339 10.033      = 39.005/3 =  9.75

  the methodology used to generate these, btw, are just: start spamd, and do
  "for mail in [100-message corpus] ; do spamc < $mail ; done", in other words
  send 1 message through, each time, in sequence.

  note the *lower* variability in the m/s figures using the new code. Even
  though my machine has 1Gb of RAM, the re-use of the same spamd child
  processes seems to increase the predictability of scan times, even without
  swapping!

- a bonus: we'll have a back-channel from the spamd children to the spamd
  parent to send messages and report stats.   this should be pretty
  damn useful ;)


So, I'll upload the current statke of the patch -- I hacked it up over the
last few days and finished it, more or less, last night.  I'd really appreciate
some people trying it out...

regarding target: 3.1.0 at least.  But the way I did it is pretty nonintrusive
-- in my opinion it could even fit into 3.0.x ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.