You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dr Robert Young <rc...@aliconsultants.com> on 2005/06/30 17:22:12 UTC

A training issue of good vs bad

I have verified that the installation I am taking over has Spamssassin 
2.63. It was trained (I presume) on good & bad emails about 1-2 yrs 
ago. SPAM has been "automatically" collected from users and fed back 
into the system. However, there has been no updating on the "good" 
email training.

Over time, more and more spam has been getting through. Would "not" 
updating the "good" email training cause such an effect to be seen?


Re: A training issue of good vs bad

Posted by Matt Kettler <mk...@evi-inc.com>.
Dr Robert Young wrote:
> I have verified that the installation I am taking over has Spamssassin
> 2.63. It was trained (I presume) on good & bad emails about 1-2 yrs ago.
> SPAM has been "automatically" collected from users and fed back into the
> system. However, there has been no updating on the "good" email training.

(Word of warning. SA 2.63 contains a DoS vulnerability. 2.64, or 3.0.4 are both
free of DoS vulnerabilities and you should consider an upgrade when you can.)

> Over time, more and more spam has been getting through. Would "not"
> updating the "good" email training cause such an effect to be seen?
> 


No. Failure to train more ham would cause FPs.  (ham = non spam = "good" mail).


HOWEVER, one thing that might be biting you as a problem is the autolearner. SA
by default will autolearn any very-low scoring email as ham.

Since you're running a relatively old version of SA, it might be failing to
detect some of the latest nigerian scams with static rules, and may autolearn
them as ham. This over time can cause a considerable mis-learning bias.


IMNSHO this is a general problem with the default ham learning threshold in SA.
If you don't keep your SA up-to-date, and carefully watch the autolearner for
mis-learning, the autolearer will eventually go astray heavily poison your bayes
database for some of the more subtle forms of spam.

With your "automatic" feedback from users, this problem should be mitigated, but
it's a bit Dependant on your users actually bothering to feed back problem emails.