You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@kluge.net> on 2004/02/21 04:51:43 UTC

spamd optimization idea

I was thinking about this idea this morning.  Instead of fork()ing off
a process per incoming message, what if we initially fork()ed off -m
spamd processes, and in the same way we do mass-check w/ multiple procs,
we call out to the children as work requests come in?

I'd think about threading, but not all perls are thread-capable (most?)...

-- 
Randomly Generated Tagline:
"These periods are always 15 minutes shorter than I'd like them, and 
 probably 15 minutes longer than you'd like them."   - Prof. Van Bluemel

Re: spamd optimization idea

Posted by Kelsey Cummings <kg...@sonic.net>.
On Sat, Feb 21, 2004 at 01:13:02PM -0500, Theo Van Dinter wrote:
> On Sat, Feb 21, 2004 at 12:53:46PM -0500, Theo Van Dinter wrote:
> > As I said, it'd be nice if we could use some of the httpd code to do
> > this, since this is exactly what they do... and since we're all in the
> > ASF and using the same license now ... ;)
> 
> Actually, poking around for a minute, Net::Server does what we want
> already (specifically the PreFork method).  And it's even an all-perl
> solution.  ;)

Some quick thoughts:

Preforking is good for a bit of extra performance, but with copy-on-write
it's not that much is it?  And, this will completely break spamd's ability
to setuid to the user to work ontop of a traditional unix system.  Or
rather, you can only use a preforking method if you are runing with
virtual-users or SQL users, etc.  To work around this, SA'd have to just
change effective user and not real user in it's setuid calls (so it could
return to uid user and setuid to another user.)  [I can't remember if
that's even possible.]  This would leave alot of code and dependent modules
running as root whereas at this time, only a tiny bit of code runs as root
after the fork.  Since I helped write that code I'm quite confident that
it's secure.

-- 
Kelsey Cummings - kgc@sonic.net           sonic.net, inc.
System Administrator                      2260 Apollo Way
707.522.1000 (Voice)                      Santa Rosa, CA 95407
707.547.2199 (Fax)                        http://www.sonic.net/
Fingerprint = D5F9 667F 5D32 7347 0B79  8DB7 2B42 86B6 4E2C 3896

Re: spamd optimization idea

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sat, Feb 21, 2004 at 12:53:46PM -0500, Theo Van Dinter wrote:
> As I said, it'd be nice if we could use some of the httpd code to do
> this, since this is exactly what they do... and since we're all in the
> ASF and using the same license now ... ;)

Actually, poking around for a minute, Net::Server does what we want
already (specifically the PreFork method).  And it's even an all-perl
solution.  ;)

-- 
Randomly Generated Tagline:
"Great is the art of beginning, but greater is the art of ending." - Longfellow

Re: spamd optimization idea

Posted by Theo Van Dinter <fe...@kluge.net>.
On Fri, Feb 20, 2004 at 11:48:03PM -0800, Dan Quinlan wrote:
> spamd forks off one process per incoming message?  (I should disclose
> that I don't run spamd.)

Yeah.  It does the traditional server model.

> It's crazy to fork for every check for servers written in C (since the
> days of NCSA httpd).  In Perl it's serious nutball material.  100%
> agreement that we should change this.

Well, it's not super terrible, you just have the cost of the fork()
(it's not like perl has to re-compile all the modules).  It works ok,
but as usual can be made more efficient. :)

> It might make sense to generalize how it works and share code between
> mass-check and spamd.

That may be possible.  In mass-check we basically spawn X kids and for
each spawn we throw a target mailbox+offset (on the filesystem) at them.
We then wait for a response from any child, and when we get one, throw
the result into the log file, and send that child another mailbox+offset.
That happens until we either run out of messages or we need to restart
the kids (--restart, which causes a "ok, kill yourself now" message to
be sent out, then the spawn thing happens again, etc.)

That's sort of copyable for spamd -- except we'd have to track which
child is dealing with what input connection so we can pass the correct
data back to the correct connection.  Very simply, if all the children
are busy, we just stop accepting new connections.  We'd also want to
track msgs/child so we can restart them periodically, and we'd also want
to deal with timeouts for children -- although I'm more inclined to deal
with that in the child than the parent.

As I said, it'd be nice if we could use some of the httpd code to do
this, since this is exactly what they do... and since we're all in the
ASF and using the same license now ... ;)

Then again, this shouldn't be too difficult to get going, so ...

-- 
Randomly Generated Tagline:
I think $[ is more like a coelacanth than a mastadon.
              -- Larry Wall in <19...@wall.org>

Re: spamd optimization idea

Posted by Daniel Quinlan <qu...@pathname.com>.
Theo Van Dinter <fe...@kluge.net> writes:

> --qnK4RqISe3HuYx1/
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> I was thinking about this idea this morning.  Instead of fork()ing off
> a process per incoming message, what if we initially fork()ed off -m
> spamd processes, and in the same way we do mass-check w/ multiple procs,
> we call out to the children as work requests come in?

spamd forks off one process per incoming message?  (I should disclose
that I don't run spamd.)

It's crazy to fork for every check for servers written in C (since the
days of NCSA httpd).  In Perl it's serious nutball material.  100%
agreement that we should change this.

It might make sense to generalize how it works and share code between
mass-check and spamd.

Daniel

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/    and open source consulting

RE: spamd optimization idea

Posted by Phillip Evans <Sp...@evanscorp.net.au>.
I've actually done this for Windows (somewhat amusing considering the
tagline).

I'm still working on it but I already have a Windows service that embeds the
Perl interpreter and handles all of the spamd socket communications.  I have
plans to port this to Linux once I have the Windows version singing.  ie: I
have written the code in C++ with some degree of portability in mind.

I haven't quite got a comparable performance test environment for it yet
(ie: spamd vs my code on Windows or on Linux) but it's looking to be at
least twice as fast per message with no discernable memory leaks.

This is only a part-time project for me so it's taking a while to round it
out.  It'll probably be another month or so before I have a Linux port
running (currently putting in some queue management stuff in case of
excessive load).

If you're game, you could alpha test for me?

Phil.

 

-----Original Message-----
From: Theo Van Dinter [mailto:felicity@kluge.net] 
Sent: Saturday, 21 February 2004 3:08 PM
To: Gary Funck
Cc: Spamassassin Devel List
Subject: Re: spamd optimization idea

On Fri, Feb 20, 2004 at 07:59:44PM -0800, Gary Funck wrote:
> Isn't there a kind of belt-and-suspenders justification for restarting 
> each sub-process (via fork) - that memory leaks may develop, and by 
> restarting, their effect is reduced.

Yeah, but restarting for every message causes a lot of overhead which
doesn't need to occur.  Allowing the children to exit/restart is pretty
simple.  We do it in mass-check (--restart), and it's the same type of thing
the Apache httpd does.  (wondering if we can use some of that code
actually...  perhaps a C connection manager and perl compute children or
something.)

--
Randomly Generated Tagline:
"I wouldnt trust NT to feed my cat."    - Unknown poster on Slashdot


RE: spamd optimization idea

Posted by Gary Funck <ga...@intrepid.com>.

> From: Theo Van Dinter
> Sent: Friday, February 20, 2004 8:08 PM
[...]
> 
> On Fri, Feb 20, 2004 at 07:59:44PM -0800, Gary Funck wrote:
> > Isn't there a kind of belt-and-suspenders justification for restarting
> > each sub-process (via fork) - that memory leaks may develop, and by
> > restarting, their effect is reduced.
> 
> Yeah, but restarting for every message causes a lot of overhead which
> doesn't need to occur.  Allowing the children to exit/restart is
> pretty simple.  We do it in mass-check (--restart), and it's the same
> type of thing the Apache httpd does.  (wondering if we can use some of
> that code actually...  perhaps a C connection manager and perl compute
> children or something.)

Doesn't MIMEdefang do something like that as well?


Re: spamd optimization idea

Posted by Theo Van Dinter <fe...@kluge.net>.
On Fri, Feb 20, 2004 at 07:59:44PM -0800, Gary Funck wrote:
> Isn't there a kind of belt-and-suspenders justification for restarting
> each sub-process (via fork) - that memory leaks may develop, and by
> restarting, their effect is reduced.

Yeah, but restarting for every message causes a lot of overhead which
doesn't need to occur.  Allowing the children to exit/restart is
pretty simple.  We do it in mass-check (--restart), and it's the same
type of thing the Apache httpd does.  (wondering if we can use some of
that code actually...  perhaps a C connection manager and perl compute
children or something.)

-- 
Randomly Generated Tagline:
"I wouldnt trust NT to feed my cat."    - Unknown poster on Slashdot

RE: spamd optimization idea

Posted by Gary Funck <ga...@intrepid.com>.


> From: Theo Van Dinter [mailto:felicity@kluge.net]
> Sent: Friday, February 20, 2004 7:52 PM
>
> I was thinking about this idea this morning.  Instead of fork()ing off
> a process per incoming message, what if we initially fork()ed off -m
> spamd processes, and in the same way we do mass-check w/ multiple procs,
> we call out to the children as work requests come in?
>

Isn't there a kind of belt-and-suspenders justification for restarting
each sub-process (via fork) - that memory leaks may develop, and by
restarting, their effect is reduced.

> I'd think about threading, but not all perls are thread-capable (most?)...

I've read that Perl threads have a lot of overhead, if the application that
starts them has a lot of memory state (which spamd does).