You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dr Robert Young <rc...@aliconsultants.com> on 2005/06/29 18:39:48 UTC

On the subject of training ( ie frequency)...

Being new to SA, I thought I would inquire as to the frequency of 
training/rules updates required for SA. I know the spammers are an 
industrious lot, so existing rules and stats would eventually loose 
their effectiveness. However, what have people found to be a good 
training/update schedule in the "real world"....

immediately?
daily?
monthly?
.
.

I see a great deal of activity on the group dealing with rules, etc.  
This tends to "suggest" that it is a race to keep one step ahead (or at 
least even) with the junk mailers (which I can certainly believe).

I have a system where the previous person trained the Bayes stats once 
about 1-2 yrs ago, has used only the "default" rules, and has "auto 
learn" on. Reports are it has been losing its effectiveness slowly over 
this period, hence my question.

I'm trying to determine what one "should" do.


Re: On the subject of training ( ie frequency)...

Posted by Loren Wilton <lw...@earthlink.net>.
> I have a system where the previous person trained the Bayes stats once
> about 1-2 yrs ago, has used only the "default" rules, and has "auto
> learn" on.

> I'm trying to determine what one "should" do.

First, you should probably upgrade to a recent version of SA.  2 years ago
would probably be older than 2.6something, at a wild guess.  If you aren't
running at least 2.63/2.64 I'd suggest biting the bullet and simply going to
3.04 at once.

If you are running 2.63 you really need to go to 2.64, which should be
completely painless.  Or of course all the way up to the current version.
2.6x probably won't be real well supported all that much longer, with 3.1
out the door soon.

> Being new to SA, I thought I would inquire as to the frequency of
> training/rules updates required for SA.

There are people that tell you you don't need to use Bayes at all, and they
are right.  OTOH, if your system really has gone 2 years without any
maintanance and is still stopping a respectable percentage of spam it is
probably a testament to how well Bayes can work.  The original guy must have
tweaked the auto-learn thresholds a little, since the default values and
lead to bad results within 6 months or so if they aren't watched.

My personal off the wall suggestions on running a system:

First, upgrade to the latest if you aren't on at least 2.63, and consider
strongly upgrading if you are.

Second, consider not using auto-learning, and instead manually learning some
spam and ham.  You will have to feed the system a few hundred each of spam
and ham to get it started, and you will probably want to be moderately
dilligent for a week or so at giving it a few more of each every day.  In
particular, watch the bayes scores on low-scoring spam, and anything that
isn't getting bayes_99, feed it to bayes.  Also watch ham, and feed anything
that is pretty typical and gets more than about a 20 score to bayes as ham.

If you get 3.0.x<4, change the score on bayes_99 to something around 4.  It
got auto-scored to waaay to low a value during the 3.0 scoring.

If you do want to use auto-learning (which I would not turn on until I
manually started Bayes in the right direction with manual learning), change
the autolearn-ham threshold to something like -.1 instead of the positive
value it has now.

Third, go off to rulesemporium and ACTUALLY READ (many people refuse to do
this, then complain) the descriptions of the rulesets, and decide which ones
would be reasonable in your situation.  Pick those up manually, or even
better, set up RDJ to pull any new versions once every day or two or three.
They are rarely updated more often than once a week.

Fourth, if you want to use net tests, make sure that you actually get them
working (right version of Net::DNS and all), and make sure all of the surbl
rules are working.

And that should do it for you.  Keep an eye on ham and spam scores, and if
using bayes, on bayes scores for low-scoring spam and high-scoring ham.
Obviously look at any FPs and FNs and take appropriate corrective action of
needed.  Occasionally toss a handful of spam and ham to Bayes, and
occasionally (if not automaticaly) update the rulesets.

        Loren


Re: On the subject of training ( ie frequency)...

Posted by Andy Jezierski <aj...@stepan.com>.
Dr Robert Young <rc...@aliconsultants.com> wrote on 06/29/2005 11:39:48 
AM:

[snip]
> 
> I have a system where the previous person trained the Bayes stats once 
> about 1-2 yrs ago, has used only the "default" rules, and has "auto 
> learn" on. Reports are it has been losing its effectiveness slowly over 
> this period, hence my question.
> 
> I'm trying to determine what one "should" do.
> 

Look at the Spam headers on your messages. For those that are non-spam, 
are you seeing low Bayes scores, Bayes_00, Bayes_05 etc.?  For your spam 
messages, are you seeing high Bayes scores?  Bayes_99, Bayes_80, etc.?  If 
so, your bayes training is probably OK. If not, you may want to think 
about doing some manual training for both ham & spam.

For the spam that does make it through, you may want to classify the 
different types, and take a look at some of the SARE rules at 
rulesemporium to see if they would be a fit. Also, what version of SA are 
you running?  Are you using SURBL? If it 3.0.0 or higher you should be OK, 
if you're still running 2.6x you may want to upgrade (preferable) or 
install the SURBL patch. 

Myself, I run with about 25 or so SARE rules sets that are checked for 
updates about once a week, along with a few others. Greylisting, Razor, 
Pyzor, DCC, SURBL, and just recently URIBL.  I have very, very little spam 
that slips through unflagged.

Andy