You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/05/05 00:53:13 UTC

Re: memory-usage going BOOM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


jdow writes:
> From: <da...@umiacs.umd.edu>
> 
> > BTDT, bought the T-shirt. Adding memory will help. Short term solution
> > can be adding swap space. Another option can be running SA remotely 
> > on another machine (users run spamc -d sa.machine.com)
> 
> This might be "A GOOD TIME" for someone to create a small exposition
> regarding spamassassin memory usage. I note that it seems to hover
> around 56 megabyte to 65 megabyte range even when freshly spawned. I
> do not run AWL. I have about 5 megabytes of Bayes data. (I train
> lightly of late and only when something new turns up.) I am running
> a fair number of SARE rule sets, about 1.3 megabytes worth. How does
> this add up to 56 megabytes or more? Does perl take that much space?
> Or does its expansion of the rule sets make it so large?

OK, here's a quick wrap-up of what I've observed regarding memory,
because this has been one of my priorities for 3.1.0:

- - typical perl 5.8.x spamd is about 25-35megs of memory (per "VSZ") using
  the default ruleset.

- - this goes up as you add additional rules. ;)   With rulesets that
  contain a lot of complexity, they can easily add 10MB to in-core size,
  no problem.  Relatively small ruleset sizes on-disk can indeed map
  to a large in-memory footprint, since they're compiled by perl into
  a speed-optimised format.

- - on a linux server, "top" typically reports a very small amount of memory
  shared between processes ("SHR").  that's a glitch in linux 2.4.x and
  2.6.x kernels; in fact, a lot more RAM is shared, it's just not tracked
  as such by the kernel any more, although the documentation all states
  that it is.  (annoying.)  In fact, about 70% of the memory usage
  of all the spamd children is shared between them.

- - SpamAssassin 3.1.0 is a lot better at effective RAM usage than 3.0.0,
  using its new preforking algorithm.  This is because it (a) runs with a
  smaller number of active children, and (b) keeps one or two servers very
  busy, using the others for "overflow", instead of round-robin serving
  across all the servers equally (which increases swapping).

- - typically if a spamd process is reported to have over 100 MB of RAM
  usage, it's indicating a problem with one of the bits of code that uses
  DB_File -- AWL or Bayes.  Extremely large AWL/Bayes db files can cause a
  spamd process to bloat up massively.  If you're seeing this, go check
  "find" for very large db files...

- - perl 5.6.x uses less RAM than 5.8.x (and is faster ;).

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFCeVJZMJF5cimLx9ARAjHfAJoD5dS3IrxySXbKYJ7MfupgSEbxzACgmG7Z
kCUM/hKJ+vVfp7PdD00AKIo=
=GLR+
-----END PGP SIGNATURE-----


Re: memory-usage going BOOM

Posted by Patrick von der Hagen <pa...@wudika.de>.
Dennis Davis wrote:
[...]
> Some on this list recommended reducing --max-conn-per-child from the
> default of 200 to reduce possible memory leakage in earlier versions
> of SpamAssassin.  I doubt that this is a problem now, but it might
> be worth trying as a precautionary measure.

I am absulotely CERTAIN that memory leakage is a huge problem at the 
moment. Using SpamAssassin 3.0.2 for threee month I witnessed no 
problems at all, but on two consecutive days memory-usage went through 
the roof and killed my server. There is no AWL and even with Bayes 
disabled some spamd-processses reached up to 400MB of memory usage. 
After setting "--max-conn-per-child=20" I haven't seen more than 200MB 
for a single spamd, which is five or even six times as much as usual 
(littel bit more than 30MB).

Stephen M. Przepiora, David Stern and Johann Spies confirmed that they 
have seen "memory-usage going BOOM" recently, being forced to add more 
memory, to tune --max-conn-per-child" or just killing every child 
growing over a certain threshold. That's what I'd consider to be "4 
reports of suspected memory leakage in a single day".

Personally I still consider 3.0.2 a recent version of SpamAssassin and 
scanning through the Canges of 3.0.3 I didn't find "fixed 
huge-memory-leak, upgrade ASAP!"....

I suppose there are spams sent to my server that trigger a 
SpamAssassin-bug. I don't know wheter the spammers really know that they 
trigger a memory-leak, but if they do, I expect that all of us will have 
to face a huge problem quite soon.

-- 
CU,
    Patrick.

Re: memory-usage going BOOM

Posted by Dennis Davis <D....@bath.ac.uk>.
On Wed, 4 May 2005, Justin Mason wrote:

> From: Justin Mason <jm...@jmason.org>
> To: jdow <jd...@earthlink.net>
> Cc: users@spamassassin.apache.org
> Date: Wed, 04 May 2005 15:53:13 -0700
> Subject: Re: memory-usage going BOOM 
> 
> jdow writes:
> > From: <da...@umiacs.umd.edu>
> > 
> > > BTDT, bought the T-shirt. Adding memory will help. Short term solution
> > > can be adding swap space. Another option can be running SA remotely 
> > > on another machine (users run spamc -d sa.machine.com)
> > 
> > This might be "A GOOD TIME" for someone to create a small exposition
> > regarding spamassassin memory usage. I note that it seems to hover
> > around 56 megabyte to 65 megabyte range even when freshly spawned. I
> > do not run AWL. I have about 5 megabytes of Bayes data. (I train
> > lightly of late and only when something new turns up.) I am running
> > a fair number of SARE rule sets, about 1.3 megabytes worth. How does
> > this add up to 56 megabytes or more? Does perl take that much space?
> > Or does its expansion of the rule sets make it so large?
> 
> OK, here's a quick wrap-up of what I've observed regarding memory,
> because this has been one of my priorities for 3.1.0:

...

> - - SpamAssassin 3.1.0 is a lot better at effective RAM usage than
> 3.0.0, using its new preforking algorithm.  This is because it
> (a) runs with a smaller number of active children, and (b) keeps
> one or two servers very busy, using the others for "overflow",
> instead of round-robin serving across all the servers equally
> (which increases swapping).

I've been using Justin's patch:

http://bugzilla.spamassassin.org/show_bug.cgi?id=3983

which adds this functionality with SpamAssassin 3.0.2 for some time.
It's been in production use on our frontline mail servers and seems
to work well.  It certainly installs into SpamAssassin 3.0.3 and
I'll try it out when I move SpamAssassin 3.0.3 into production use.
Currently I have this patched version on a couple of test servers
and have seen no problems.

Some on this list recommended reducing --max-conn-per-child from the
default of 200 to reduce possible memory leakage in earlier versions
of SpamAssassin.  I doubt that this is a problem now, but it might
be worth trying as a precautionary measure.
-- 
Dennis Davis, BUCS, University of Bath, Bath, BA2 7AY, UK
D.H.Davis@bath.ac.uk               Phone: +44 1225 386101

Re: memory-usage going BOOM

Posted by "Stephen M. Przepiora" <sm...@ncoastsoft.com>.
Hello All,

Justin: SpamAssassin 3.1.0 is a lot better at effective RAM usage than 3.0.0 ... This is because it (a) runs with a
  smaller number of active children ... keeps one or two servers very busy, using the others for "overflow"


We process 60K messages a day and use most of our 10 spamd processes, 
this "fix" for memory usage will not help us. The only thing that has 
worked for us is to limit the number of spammy messages reaching SA, and 
killing off the spamd processes after a few scans.

Steve

Justin Mason wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>
>jdow writes:
>  
>
>>From: <da...@umiacs.umd.edu>
>>
>>    
>>
>>>BTDT, bought the T-shirt. Adding memory will help. Short term solution
>>>can be adding swap space. Another option can be running SA remotely 
>>>on another machine (users run spamc -d sa.machine.com)
>>>      
>>>
>>This might be "A GOOD TIME" for someone to create a small exposition
>>regarding spamassassin memory usage. I note that it seems to hover
>>around 56 megabyte to 65 megabyte range even when freshly spawned. I
>>do not run AWL. I have about 5 megabytes of Bayes data. (I train
>>lightly of late and only when something new turns up.) I am running
>>a fair number of SARE rule sets, about 1.3 megabytes worth. How does
>>this add up to 56 megabytes or more? Does perl take that much space?
>>Or does its expansion of the rule sets make it so large?
>>    
>>
>
>OK, here's a quick wrap-up of what I've observed regarding memory,
>because this has been one of my priorities for 3.1.0:
>
>- - typical perl 5.8.x spamd is about 25-35megs of memory (per "VSZ") using
>  the default ruleset.
>
>- - this goes up as you add additional rules. ;)   With rulesets that
>  contain a lot of complexity, they can easily add 10MB to in-core size,
>  no problem.  Relatively small ruleset sizes on-disk can indeed map
>  to a large in-memory footprint, since they're compiled by perl into
>  a speed-optimised format.
>
>- - on a linux server, "top" typically reports a very small amount of memory
>  shared between processes ("SHR").  that's a glitch in linux 2.4.x and
>  2.6.x kernels; in fact, a lot more RAM is shared, it's just not tracked
>  as such by the kernel any more, although the documentation all states
>  that it is.  (annoying.)  In fact, about 70% of the memory usage
>  of all the spamd children is shared between them.
>
>- - SpamAssassin 3.1.0 is a lot better at effective RAM usage than 3.0.0,
>  using its new preforking algorithm.  This is because it (a) runs with a
>  smaller number of active children, and (b) keeps one or two servers very
>  busy, using the others for "overflow", instead of round-robin serving
>  across all the servers equally (which increases swapping).
>
>- - typically if a spamd process is reported to have over 100 MB of RAM
>  usage, it's indicating a problem with one of the bits of code that uses
>  DB_File -- AWL or Bayes.  Extremely large AWL/Bayes db files can cause a
>  spamd process to bloat up massively.  If you're seeing this, go check
>  "find" for very large db files...
>
>- - perl 5.6.x uses less RAM than 5.8.x (and is faster ;).
>
>- --j.
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.2.5 (GNU/Linux)
>Comment: Exmh CVS
>
>iD8DBQFCeVJZMJF5cimLx9ARAjHfAJoD5dS3IrxySXbKYJ7MfupgSEbxzACgmG7Z
>kCUM/hKJ+vVfp7PdD00AKIo=
>=GLR+
>-----END PGP SIGNATURE-----
>
>
>
>  
>


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.11.3 - Release Date: 5/3/2005


Re: memory-usage going BOOM

Posted by jdow <jd...@earthlink.net>.
Thanks. That does help understand what is going on. This might be a
good writeup for the wiki.

{^_^}
----- Original Message ----- 
From: "Justin Mason" <jm...@jmason.org>


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> jdow writes:
> > From: <da...@umiacs.umd.edu>
> >
> > > BTDT, bought the T-shirt. Adding memory will help. Short term solution
> > > can be adding swap space. Another option can be running SA remotely
> > > on another machine (users run spamc -d sa.machine.com)
> >
> > This might be "A GOOD TIME" for someone to create a small exposition
> > regarding spamassassin memory usage. I note that it seems to hover
> > around 56 megabyte to 65 megabyte range even when freshly spawned. I
> > do not run AWL. I have about 5 megabytes of Bayes data. (I train
> > lightly of late and only when something new turns up.) I am running
> > a fair number of SARE rule sets, about 1.3 megabytes worth. How does
> > this add up to 56 megabytes or more? Does perl take that much space?
> > Or does its expansion of the rule sets make it so large?
>
> OK, here's a quick wrap-up of what I've observed regarding memory,
> because this has been one of my priorities for 3.1.0:
>
> - - typical perl 5.8.x spamd is about 25-35megs of memory (per "VSZ")
using
>   the default ruleset.
>
> - - this goes up as you add additional rules. ;)   With rulesets that
>   contain a lot of complexity, they can easily add 10MB to in-core size,
>   no problem.  Relatively small ruleset sizes on-disk can indeed map
>   to a large in-memory footprint, since they're compiled by perl into
>   a speed-optimised format.
>
> - - on a linux server, "top" typically reports a very small amount of
memory
>   shared between processes ("SHR").  that's a glitch in linux 2.4.x and
>   2.6.x kernels; in fact, a lot more RAM is shared, it's just not tracked
>   as such by the kernel any more, although the documentation all states
>   that it is.  (annoying.)  In fact, about 70% of the memory usage
>   of all the spamd children is shared between them.
>
> - - SpamAssassin 3.1.0 is a lot better at effective RAM usage than 3.0.0,
>   using its new preforking algorithm.  This is because it (a) runs with a
>   smaller number of active children, and (b) keeps one or two servers very
>   busy, using the others for "overflow", instead of round-robin serving
>   across all the servers equally (which increases swapping).
>
> - - typically if a spamd process is reported to have over 100 MB of RAM
>   usage, it's indicating a problem with one of the bits of code that uses
>   DB_File -- AWL or Bayes.  Extremely large AWL/Bayes db files can cause a
>   spamd process to bloat up massively.  If you're seeing this, go check
>   "find" for very large db files...
>
> - - perl 5.6.x uses less RAM than 5.8.x (and is faster ;).
>
> - --j.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Exmh CVS
>
> iD8DBQFCeVJZMJF5cimLx9ARAjHfAJoD5dS3IrxySXbKYJ7MfupgSEbxzACgmG7Z
> kCUM/hKJ+vVfp7PdD00AKIo=
> =GLR+
> -----END PGP SIGNATURE-----



Re: memory-usage going BOOM

Posted by Loren Wilton <lw...@earthlink.net>.
> from maillog
> Deep recursion on subroutine "Mail::SpamAssassin::Message::Node::finish"
> at /usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Message/Node.pm
> line 659

As far as I know (which may be wrong) the deep recursion thing isn't related
to either bayes or awl expiry.  I seem to recall that the devs had a bug on
this, but nobody could ever reproduce the problem to fix it.  Don't recall
if the bug is still open or got dumped for being unreproducable.

IF you can track this down to a particular message or set of circumstances,
and IF it isn't either a bayes or awl expiry run, then it should be reported
in BZ along with some appropriate documentation or test cases.

        Loren


Re: memory-usage going BOOM

Posted by Bikrant Neupane <bi...@wlink.com.np>.
On Thursday 05 May 2005 04:38, Justin Mason wrote:
> jdow writes:
> > From: <da...@umiacs.umd.edu>
> >
> > > BTDT, bought the T-shirt. Adding memory will help. Short term solution
> > > can be adding swap space. Another option can be running SA remotely
> > > on another machine (users run spamc -d sa.machine.com)
> >
> > This might be "A GOOD TIME" for someone to create a small exposition
> > regarding spamassassin memory usage. I note that it seems to hover
> > around 56 megabyte to 65 megabyte range even when freshly spawned. I
> > do not run AWL. I have about 5 megabytes of Bayes data. (I train
> > lightly of late and only when something new turns up.) I am running
> > a fair number of SARE rule sets, about 1.3 megabytes worth. How does
> > this add up to 56 megabytes or more? Does perl take that much space?
> > Or does its expansion of the rule sets make it so large?
>
> OK, here's a quick wrap-up of what I've observed regarding memory,
> because this has been one of my priorities for 3.1.0:
>
> - typical perl 5.8.x spamd is about 25-35megs of memory (per "VSZ") using
>   the default ruleset.
>
> - this goes up as you add additional rules. ;)   With rulesets that
>   contain a lot of complexity, they can easily add 10MB to in-core size,
>   no problem.  Relatively small ruleset sizes on-disk can indeed map
>   to a large in-memory footprint, since they're compiled by perl into
>   a speed-optimised format.
>
> - on a linux server, "top" typically reports a very small amount of memory
>   shared between processes ("SHR").  that's a glitch in linux 2.4.x and
>   2.6.x kernels; in fact, a lot more RAM is shared, it's just not tracked
>   as such by the kernel any more, although the documentation all states
>   that it is.  (annoying.)  In fact, about 70% of the memory usage
>   of all the spamd children is shared between them.
>
> - SpamAssassin 3.1.0 is a lot better at effective RAM usage than 3.0.0,
>   using its new preforking algorithm.  This is because it (a) runs with a
>   smaller number of active children, and (b) keeps one or two servers very
>   busy, using the others for "overflow", instead of round-robin serving
>   across all the servers equally (which increases swapping).
>
> - typically if a spamd process is reported to have over 100 MB of RAM
>   usage, it's indicating a problem with one of the bits of code that uses
>   DB_File -- AWL or Bayes.  Extremely large AWL/Bayes db files can cause a
>   spamd process to bloat up massively.  If you're seeing this, go check
>   "find" for very large db files...
>
> - perl 5.6.x uses less RAM than 5.8.x (and is faster ;).

I will be waiting for 3.1.0 release. In the mean time I am having big problem  
with spamd (3.0.3) on FreeBSD 5.3 with perl 5.8.5
I have already reported the problem in separate thread but alas!!!  I am 
getting no more response.

I have Xeon 2.4 Ghz with HT and 1536 ram. I am running 15 spamd child process.
Size of each process average between 40-60MB at normal. But sometimes it goes 
beyond 300-400 MB! At such time I see "Deep Recursion" message in log files.

from top
10152 root       4    0   134M   116M accept 1   1:29  2.29%  2.29% perl

from maillog
Deep recursion on subroutine "Mail::SpamAssassin::Message::Node::finish" 
at /usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/Message/Node.pm 
line 659

My system is swapping lot, up top 1GB!!! I had installed perl 5.8.5 from 
Freebsd ports.

bayes and autowhitelist are rutned on. 
-rw-------  1 smtpd  smtpd   192K May  5 10:34 auto-whitelist
-rw-------  1 smtpd  smtpd    12K May  5 10:34 bayes_journal
-rw-------  1 smtpd  smtpd   656K May  5 10:33 bayes_seen
-rw-------  1 smtpd  smtpd   5.1M May  5 10:33 bayes_toks
-rw-r--r--  1 smtpd  smtpd   1.1K May  1 12:24 user_prefs

As seen the files are quite small. In my old setup ( SA 2.6 on Linux) 
whitelist file was as big as 400MB but still I never had any problem.

best regards,
Bikrant

>
> --j.