You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Fitzpatrick <li...@webtent.net> on 2005/12/15 00:11:30 UTC

Re: Timing totals--

On Wed, 2005-12-14 at 17:41 -0500, Matt Kettler wrote:
> Robert Fitzpatrick wrote:
> You can improve speed by:
> 1) disabling things, such as bayes URIBLS and RBLs
> 2) If you are using bayes switching from DB_File BayesStore to SQL (recommended)
> or SDBM (fast but not well tested) will yield considerable gains.
> 3) Minimizing your add-on rulesets.
> 
> I'd suggest doing a little experiment and disable DNS and Bayes and see what
> happens to your scan times.
> 
> /etc/mail/spamassassin/local.cf:
> use_bayes 0
> dns_available no
> 
> Be sure to restart amavis to re-parse these options. Doing this will cause more
> spam to skip by, but doing this will quickly tell you if one or the other of
> thee features is your problem.
> 
> If scan times improve substantially, try turning bayes on and see what happens.
> Then turn bayes off and turn on DNS and see what happens. This will help
> determine which feature is causing your system the extra slowdown.

I tried dns_available no before, but that seems to have been done the
trick by disabling bayes as well. My timings are mostly 300-500 with
some 1000ms. Seems timing drops to these levels after disabling dns, but
my queue doesn't start dropping until I disable both, then wham, down
she goes...thanks.

But now, what do I need to know about these features, is it my Berkeley
DB? And DNS seems to be fine on the server.

--
Robert


Re: Timing totals

Posted by Robert Fitzpatrick <li...@webtent.net>.
On Wed, 2005-12-14 at 19:01 -0500, Matt Kettler wrote:
> >>Note that "phase 2" reflects the time in seconds to scan 2000 messages using
> >>spamc. Mysql and SDBM are nearly 3 times faster at this.
> >>
> >>Since sql is well-tested, that might be a better way for you to go. SDBM has
> >>some issues.
> > 
> > 
> > I have mysql on the server already, I guess I can change this in
> > local.cf, can lookup the instructions for changing over, any thoughts or
> > warnings? 
> 
> Search the wiki, the wiki is your friend. :)

My issue finally resolved last night down to duplicate .cf files in my
spamassassin config folder. I run RulesDuJour and it puts the files in a
sub folder, but there were duplicates in the config folder. Thing is,
why did this not cause an issue using SA 3.0?

Once I disabled dns and bayes, things worked, but still the dups were
processing. I removed the dups and whala! Once I got that done, it runs
fine with dns and bayes enabled. I even took amavis back up to
max_server of 10. But I will change to MySQL. Thanks for the help!

I read one of your other posts about antidrug being in 3.1 already, any
others. Also, I have some other recipes called 'Sober_German_Spam' and
'SOBER_P_SPAM' I pickup from the web in my local.cf, are these still
valid?

--
Robert


Re: Timing totals--

Posted by Matt Kettler <mk...@evi-inc.com>.
Robert Fitzpatrick wrote:
> On Wed, 2005-12-14 at 18:29 -0500, Matt Kettler wrote:
> 
>>For DNS, well, DNS lookups are by nature slow, and SA makes a lot of them. You
>>can improve the speed a little by running a caching nameserver on the local
>>host, but that's not a "fix-all".
> 
> 
> Ah, that is one difference between this server and one before, BIND was
> on the server before, but I don't think the server was using localhost
> for DNS, it was using the same as I am now using on this server. Would
> spamassassin use localhost DNS over outside if the system is set to a
> different server on the network?

No. SpamAssassin uses the servers out of /etc/resolv.conf.

Personally I prefer to keep at least a forward-only caching server on localhost
and include it as the first nameserver in /etc/resolv.conf. This really saves a
lot of time as many of the DNS queries issued by SA will be repeated as it sees
spam from the same source IP over and over and over again.

It doesn't help for the first message, but helps a lot for the follow-up ones
with the same RBL entries.



>>Note that "phase 2" reflects the time in seconds to scan 2000 messages using
>>spamc. Mysql and SDBM are nearly 3 times faster at this.
>>
>>Since sql is well-tested, that might be a better way for you to go. SDBM has
>>some issues.
> 
> 
> I have mysql on the server already, I guess I can change this in
> local.cf, can lookup the instructions for changing over, any thoughts or
> warnings? 

Search the wiki, the wiki is your friend. :)

> 
> You the man, been struggling with this since yesterday on the amavis and
> postfix lists.
> 

Good luck!

Re: Timing totals--

Posted by Matt Kettler <mk...@evi-inc.com>.
Robert Fitzpatrick wrote:
> On Wed, 2005-12-14 at 17:41 -0500, Matt Kettler wrote:
> 
>>Robert Fitzpatrick wrote:
>>You can improve speed by:
>>1) disabling things, such as bayes URIBLS and RBLs
>>2) If you are using bayes switching from DB_File BayesStore to SQL (recommended)
>>or SDBM (fast but not well tested) will yield considerable gains.
>>3) Minimizing your add-on rulesets.
>>
>>I'd suggest doing a little experiment and disable DNS and Bayes and see what
>>happens to your scan times.
>>
>>/etc/mail/spamassassin/local.cf:
>>use_bayes 0
>>dns_available no
>>
>>Be sure to restart amavis to re-parse these options. Doing this will cause more
>>spam to skip by, but doing this will quickly tell you if one or the other of
>>thee features is your problem.
>>
>>If scan times improve substantially, try turning bayes on and see what happens.
>>Then turn bayes off and turn on DNS and see what happens. This will help
>>determine which feature is causing your system the extra slowdown.
> 
> 
> I tried dns_available no before, but that seems to have been done the
> trick by disabling bayes as well. My timings are mostly 300-500 with
> some 1000ms. Seems timing drops to these levels after disabling dns, but
> my queue doesn't start dropping until I disable both, then wham, down
> she goes...thanks.
> 
> But now, what do I need to know about these features, is it my Berkeley
> DB? And DNS seems to be fine on the server.

For DNS, well, DNS lookups are by nature slow, and SA makes a lot of them. You
can improve the speed a little by running a caching nameserver on the local
host, but that's not a "fix-all".

You can also try lowering rbl_timeout to 10 or so to put some shorter limits on
how long SA will wait for tardy DNS servers. This does cause the expense of
missing some responses that may have been ready to come in from slower servers.


For bayes, there's some things to look into:

If you stay with DB_File (Berkeley DB) or choose to switch to SDBM:

1) if you aren't accessing databases over NFS, change your lock_method to flock.
nfssafe is the default, but it's slower than flock.

2) Turn bayes_learn_to_journal on. This will greatly reduce lock contention on
the bayes DB when autolearning is going on.

As for bayes DB types Berkeley DB is undeniably the slowest at scanning messages.

http://wiki.apache.org/spamassassin/BayesBenchmarkResults

Note that "phase 2" reflects the time in seconds to scan 2000 messages using
spamc. Mysql and SDBM are nearly 3 times faster at this.

Since sql is well-tested, that might be a better way for you to go. SDBM has
some issues.

Either way, if you change DBs you'll want to do a sa-learn --backup
>bayesbackup, change the bayes_store_module setting and do a sa-learn --restore
bayesbackup.

Unfortunately sa-learn --restore doesn't work so well on SDBM. Which is why I'd
be reluctant to go this way unless you're ready to brave the unknown and jump
through some hoops:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4670