You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Velásquez Restrepo <da...@conexcol.com> on 2005/05/19 06:18:28 UTC

Re: Simple question TRUE or FALSE (More data to answer this question)

Software:
--------------------------------------------------------------
A perl script wich takes some file and test it using Mail::SpamAssassin to 
get it´s spam score level
OS: gentoo 2005.0
MTA: postfix

SpamAssassin:
--------------------------------------------------------------
Using: Net test, Bayes, Razor2, DCC, Phyzor, SPF Test (and everything else 
suggested by spamassassin)
Rules:
    rules_du_jour:
        http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf
        http://www.rulesemporium.com/rules/bigevil.cf
        http://mywebpages.comcast.net/mkettler/sa/antidrug.cf
        http://www.rulesemporium.com/rules/evilnumbers.cf
        http://www.stearns.org/sa-blacklist/sa-blacklist.current
        http://www.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf
        http://www.stearns.org/sa-blacklist/random.current.cf
        http://www.timj.co.uk/linux/bogus-virus-warnings.cf
        http://www.rulesemporium.com/rules/70_sare_adult.cf
        http://www.rulesemporium.com/rules/99_sare_fraud_post25x.cf
        http://www.rulesemporium.com/rules/99_sare_fraud_pre25x.cf
        http://www.rulesemporium.com/rules/72_sare_bml_post25x.cf
        http://www.rulesemporium.com/rules/71_sare_bml_pre25x.cf
        http://www.rulesemporium.com/rules/70_sare_ratware.cf
        http://www.rulesemporium.com/rules/70_sare_spoof.cf
        http://www.rulesemporium.com/rules/70_sare_bayes_poison_nxm.cf
        http://www.rulesemporium.com/rules/70_sare_oem.cf
        http://www.rulesemporium.com/rules/70_sare_random.cf
        http://www.rulesemporium.com/rules/70_sare_header.cf
        http://www.rulesemporium.com/rules/70_sare_html.cf
        http://www.rulesemporium.com/rules/70_sare_specific.cf
        http://www.rulesemporium.com/rules/71_sare_redirect_pre3.0.0.cf
        http://www.rulesemporium.com/rules/72_sare_redirect_post3.0.0.cf
        http://www.rulesemporium.com/rules/70_sare_uri0.cf
        http://www.rulesemporium.com/rules/70_sare_uri1.cf
        http://www.rulesemporium.com/rules/70_sare_uri2.cf
        http://www.rulesemporium.com/rules/70_sare_uri3.cf
        http://www.rulesemporium.com/rules/70_sare_uri_eng.cf
        http://www.rulesemporium.com/rules/70_sare_uri_arc.cf

Runtime:
--------------------------------------------------------------
4 processes in parallel mode

Harwdare:
--------------------------------------------------------------
Intel Pentium III -  1ghz - 512RAM (pci133)

top:
---------------------------------------------------------------
top - 23:03:27 up 10:39,  2 users,  load average: 5.47, 5.35, 5.19
Tasks:  62 total,   2 running,  60 sleeping,   0 stopped,   0 zombie
Cpu(s): 93.7% us,  5.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.6% hi,  0.0% si
Mem:    514036k total,   490044k used,    23992k free,     6892k buffers
Swap:   987988k total,    49672k used,   938316k free,    38012k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
27220 xmail     19   0 98680  71m 3064 R 99.9 14.2   2:38.51 
/progs/xmail/bin/mx_parser/mx_parser.pl - 1
27603 xmail     15   0  100m  95m 3064 S 36.8 19.0   2:06.76 
/progs/xmail/bin/mx_parser/mx_parser.pl - 5
28171 xmail     16   0 93604  87m 3064 D 28.9 17.4   1:11.20 
/progs/xmail/bin/mx_parser/mx_parser.pl - 4
27516 xmail     17   0 94644  88m 3064 D 13.1 17.6   2:03.70 
/progs/xmail/bin/mx_parser/mx_parser.pl - 2
27308 xmail     18   0 97960  73m 3064 D 10.5 14.5   2:35.46 
/progs/xmail/bin/mx_parser/mx_parser.pl - 3

So, here it goes again the "simple", but not short, question:
Q) With spamassassin (and all the above info) you need about 20 to 30 
seconds per email message and LOTS of RAM and CPU:
    a) TRUE
    b) FALSE



Re: Simple question TRUE or FALSE (More data to answer this question)

Posted by jdow <jd...@earthlink.net>.
From: "David Velásquez Restrepo" <da...@conexcol.com>

> Software:
> --------------------------------------------------------------
> A perl script wich takes some file and test it using Mail::SpamAssassin to
> get it´s spam score level
> OS: gentoo 2005.0
> MTA: postfix
>
> SpamAssassin:
> --------------------------------------------------------------
> Using: Net test, Bayes, Razor2, DCC, Phyzor, SPF Test (and everything else
> suggested by spamassassin)
> Rules:
>     rules_du_jour:
>         http://www.rulesemporium.com/rules/99_FVGT_Tripwire.cf
>         http://www.rulesemporium.com/rules/bigevil.cf
>         http://mywebpages.comcast.net/mkettler/sa/antidrug.cf
>         http://www.rulesemporium.com/rules/evilnumbers.cf
>         http://www.stearns.org/sa-blacklist/sa-blacklist.current
>         http://www.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf
>         http://www.stearns.org/sa-blacklist/random.current.cf
>         http://www.timj.co.uk/linux/bogus-virus-warnings.cf
>         http://www.rulesemporium.com/rules/70_sare_adult.cf
>         http://www.rulesemporium.com/rules/99_sare_fraud_post25x.cf
>         http://www.rulesemporium.com/rules/99_sare_fraud_pre25x.cf
>         http://www.rulesemporium.com/rules/72_sare_bml_post25x.cf
>         http://www.rulesemporium.com/rules/71_sare_bml_pre25x.cf
>         http://www.rulesemporium.com/rules/70_sare_ratware.cf
>         http://www.rulesemporium.com/rules/70_sare_spoof.cf
>         http://www.rulesemporium.com/rules/70_sare_bayes_poison_nxm.cf
>         http://www.rulesemporium.com/rules/70_sare_oem.cf
>         http://www.rulesemporium.com/rules/70_sare_random.cf
>         http://www.rulesemporium.com/rules/70_sare_header.cf
>         http://www.rulesemporium.com/rules/70_sare_html.cf
>         http://www.rulesemporium.com/rules/70_sare_specific.cf
>         http://www.rulesemporium.com/rules/71_sare_redirect_pre3.0.0.cf
>         http://www.rulesemporium.com/rules/72_sare_redirect_post3.0.0.cf
>         http://www.rulesemporium.com/rules/70_sare_uri0.cf
>         http://www.rulesemporium.com/rules/70_sare_uri1.cf
>         http://www.rulesemporium.com/rules/70_sare_uri2.cf
>         http://www.rulesemporium.com/rules/70_sare_uri3.cf
>         http://www.rulesemporium.com/rules/70_sare_uri_eng.cf
>         http://www.rulesemporium.com/rules/70_sare_uri_arc.cf
>
> Runtime:
> --------------------------------------------------------------
> 4 processes in parallel mode
>
> Harwdare:
> --------------------------------------------------------------
> Intel Pentium III -  1ghz - 512RAM (pci133)
>
> top:
> ---------------------------------------------------------------
> top - 23:03:27 up 10:39,  2 users,  load average: 5.47, 5.35, 5.19
> Tasks:  62 total,   2 running,  60 sleeping,   0 stopped,   0 zombie
> Cpu(s): 93.7% us,  5.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.6% hi,  0.0%
si
> Mem:    514036k total,   490044k used,    23992k free,     6892k buffers
> Swap:   987988k total,    49672k used,   938316k free,    38012k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 27220 xmail     19   0 98680  71m 3064 R 99.9 14.2   2:38.51
> /progs/xmail/bin/mx_parser/mx_parser.pl - 1
> 27603 xmail     15   0  100m  95m 3064 S 36.8 19.0   2:06.76
> /progs/xmail/bin/mx_parser/mx_parser.pl - 5
> 28171 xmail     16   0 93604  87m 3064 D 28.9 17.4   1:11.20
> /progs/xmail/bin/mx_parser/mx_parser.pl - 4
> 27516 xmail     17   0 94644  88m 3064 D 13.1 17.6   2:03.70
> /progs/xmail/bin/mx_parser/mx_parser.pl - 2
> 27308 xmail     18   0 97960  73m 3064 D 10.5 14.5   2:35.46
> /progs/xmail/bin/mx_parser/mx_parser.pl - 3
>
> So, here it goes again the "simple", but not short, question:
> Q) With spamassassin (and all the above info) you need about 20 to 30
> seconds per email message and LOTS of RAM and CPU:
>     a) TRUE
>     b) FALSE
>

Given the way you phrase that belligerent assertion I am tempted to
simply answer "true" and leave you floundering. It is obvious that for
the way you have it configured you're going to take 20-30 seconds so
the obvious answer is "true", for you. Now, if you asked, "Am I doing
something wrong?" and approached it from that direction you might
discover you can run tests in about 5 to 7 second each for your
machine. I'll be presumptuous and figure this is what you really mean.

For the run times you cite you may have a BL configuration problem,
such as trying to use a dead BL somewhere. One other thing that can
cause this is a DNS problem.

You are using larger chunks of VIRT than I am. I use about 60M where
you are using 98M. I run with "--max-conn-per-child=15". You win a
little if you either add RAM or cut down to "-m2" or "-m3". You do
have a fair amount of cache in use. Once that happens you flounder
around in cache swapping when running spamassassin.

{^_^}



Re: Simple question TRUE or FALSE (More data to answer this question)

Posted by Matt Kettler <mk...@evi-inc.com>.
David Velásquez Restrepo wrote:
> Software:
> --------------------------------------------------------------
> A perl script wich takes some file and test it using Mail::SpamAssassin
> to get it´s spam score level

If your script isn't persistent, I'd ditch it and use spamc/spamd as Justin
Mason suggested.

You'll save a lot of processor time from two things using this approach:
	1) spamd parses the rulesets when it loads, instead of on a per-message basis.
	2) You'll avoid invoking a perl process on a per-message basis, which is a huge
waste of CPU time. The perl processes will be preforked by spamd, and only spamc
(a compiled utility) gets invoked per-message.

	3) spamc has a built-in message size limit, so you'll avoid scanning messages
with large attachments that are unlikely to be spam anyway.



>        http://www.rulesemporium.com/rules/bigevil.cf

Matt Y already pointed this out, but just to underline it, bigevil will waste
TRULY massive amounts of resources on your system.

Even the author of bigevil (Chris S.) strongly recommends that nobody use it,
and if you go to the website now, it's been deleted to prevent anyone from using
it anymore.

You should easily cut 30MB or more off the size of your processes if you remove
bigevil.

In general it looks like you downloaded every optional ruleset in the world and
added it to your configuration before you started off. I would strongly
discourage doing that kind of approach to any kind of server application, and
it's especially true for spamassassin.

Start off running SA without *ANY* add on rulesets, then start adding them a few
at a time. This way if you add a bloated ruleset like bigevil, the cause of the
problem is immediately obvious.

Be very wary of any ruleset which has a .cf file that's greater than 64k in size.

Matt Y's comments on duplicated rulesets (such as antidrug.cf, and having both
the pre and post 2.5x versions of several rulesets) is also valid.

> Q) With spamassassin (and all the above info) you need about 20 to 30 seconds per email message and LOTS of RAM and CPU:
>    a) TRUE
>    b) FALSE 

a) TRUE, due to misconfiguration. With some tuning based on the tips above, this
will readily change to b) FALSE.

Re: Simple question TRUE or FALSE (More data to answer this question)

Posted by Matt Yackley <sa...@yackley.org>.
Hi David,
A few quick tips to help performance...

David Velásquez Restrepo said:
SNIP

>         http://www.rulesemporium.com/rules/bigevil.cf
Do not, I repeat do not use this file, it grew way to big.  This type of test is
better handled by SURBL.

>         http://mywebpages.comcast.net/mkettler/sa/antidrug.cf
If you are running => SA 3.0.0 antidrug is builtin to SA

>         http://www.stearns.org/sa-blacklist/sa-blacklist.current
>         http://www.stearns.org/sa-blacklist/sa-blacklist.current.uri.cf
Might want to drop these as well in favor of SURBL tests, at least the uri version.

>         http://www.rulesemporium.com/rules/99_sare_fraud_post25x.cf
>         http://www.rulesemporium.com/rules/99_sare_fraud_pre25x.cf
Depending on your SA version run only one of the above rulesets.

>         http://www.rulesemporium.com/rules/72_sare_bml_post25x.cf
>         http://www.rulesemporium.com/rules/71_sare_bml_pre25x.cf
Depending on your SA version run only one of the above rulesets.

>         http://www.rulesemporium.com/rules/71_sare_redirect_pre3.0.0.cf
>         http://www.rulesemporium.com/rules/72_sare_redirect_post3.0.0.cf
Depending on your SA version run only one of the above rulesets.

SNIP

Are you running a caching DNS server?  A caching nameserver will help quite a bit
with the net tests.

> So, here it goes again the "simple", but not short, question:
> Q) With spamassassin (and all the above info) you need about 20 to 30
> seconds per email message and LOTS of RAM and CPU:
>     a) TRUE
>     b) FALSE

Correct the above items and see how it runs after the changes.

Cheers,

matt

Re: Simple question TRUE or FALSE (More data to answer this question)

Posted by Loren Wilton <lw...@earthlink.net>.
> Software:
> --------------------------------------------------------------
> A perl script wich takes some file and test it using Mail::SpamAssassin to

Which version of SA?


> Using: Net test, Bayes, Razor2, DCC, Phyzor, SPF Test (and everything else
> suggested by spamassassin)
> Rules:
>     rules_du_jour:
>         http://www.rulesemporium.com/rules/bigevil.cf

That's your first problem.  We;ve been telling people for months to GET RID
OF THIS THING.  Probably causing 80% of your problems.

>         http://mywebpages.comcast.net/mkettler/sa/antidrug.cf

If you are on 3.x you shouldn't be running this, it is built in.  If you
aren't running 3.x, why not?

>         http://www.rulesemporium.com/rules/99_sare_fraud_post25x.cf
>         http://www.rulesemporium.com/rules/99_sare_fraud_pre25x.cf

>         http://www.rulesemporium.com/rules/72_sare_bml_post25x.cf
>         http://www.rulesemporium.com/rules/71_sare_bml_pre25x.cf

I think you aren't reading rule descriptions on our site.  Those are two
files, ONE is supposed to be used if you are on 2.4x or before, and the
OTHER if you are on 2.5x or later.

It is physically impossible for a version of SA to be BOTH a version before
and after 2.50.

>         http://www.rulesemporium.com/rules/71_sare_redirect_pre3.0.0.cf
>         http://www.rulesemporium.com/rules/72_sare_redirect_post3.0.0.cf

Same basic problem.  Here you are claiming that your version of SA is both
before and after 3.0.0.  Up above you claimed it was both before and after
2.50.

Throw out the junk you shouldn't have in those rule sets and things might
work better.

        Loren


Re: Simple question TRUE or FALSE (More data to answer this question)

Posted by Menno van Bennekom <mv...@xs4all.nl>.
> Q) With spamassassin (and all the above info) you need about 20 to 30
> seconds per email message and LOTS of RAM and CPU:
>     a) TRUE
>     b) FALSE
My answer is b), False.
I have a mailserver here that has a 1Ghz CPU and 512MB RAM and SA on that
server usually takes 2 or 3 seconds per message.
Like already posted, some of your rulesets are unnecessary because they
are included in SA (standard rulesets or SURBL).
Did you check 'cat messages | spamassassin -D' to see what part takes most
time? DNS time-outs can take a lot of time for example (also checkable
with tcpdump port 53).
Also your SMTP-server (xmail?) takes a lot of cpu. I've never used Xmail
but I use postfix (and amavisd-new) and I think it's quite memory and CPU
efficient.

Menno van Bennekom