You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paul Boven <p....@chello.nl> on 2004/02/06 21:35:10 UTC

Re: optimum configuration without bayes

Hi Gareth,

Gareth wrote:
> I'm about to implement a SpamAssassin setup for nearly one hundred users,
> I'll be using Amavis-new, and so can't have Bayes per user.... should I
> avoid Bayes all together?
> 
> Any suggestions for an optimum configuration for an installation without
> using Bayes... how successful with SpamAssassin be without Bayes...?

To get some idea of how good SpamAssassin could be with/without certain 
features, have a look in 50_scores.cf, wherever that lives on your server.

At the top, in the comments, it lists the percentages of false positives 
and negatives the developers found using the default weights for each of 
the 'sets'. This is how the default weights for the rules were 
calculated. This assumes you're using a treshhold of 5.

 From my SpamAssassin-2.61 50_scores.cf:

Set 0: (Pure SpamAssassin rules)
False positives: 0.06% (0.16% of nonspam)
False negatives: 3.87% (5.93% of spam)

Set 1: (SpamAssassin + DNS lookups etc.)
False positives: 0.07% (0.21% of nonspam)
False negatives: 3.79% (5.82% of spam)

Set 2: (SpamAssassin + Bayes)
False positives: 0.05% (0.09% of nonspam)
False negatives: 1.45% (3.13% of spam)

Set 3: (SpamAssassin + Bayes + DNS etc.)
False positives: 0.04% (0.1% of nonspam)
False negatives: 0.49% (0.92% of spam)

At my company we're using Bayes in a sitewide mode, without AWL. 
SpamAssassin is working without any DNS or other external lookups though 
sendmail does reject all nonexistent domains. We've been running it for 
4 weeks now and the percentages I'm seeing for e.g. the last week are:
False negatives: 0.98% (2.6% of spam)
False positives: none (that we know of).
This ignores the 34% of all incoming messages that were flat-out 
rejected by sendmail: though they were hopefully all spam and other 
unwanted stuff, I can't guess how the filter would have performed on those.

All in all, we seem to do better with our side-wide Bayesian filtering 
than should be expected on the basis of SpamAssassin's own tests.

Hope this helps,

Regards, Paul Boven.