You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/02/11 21:32:07 UTC

Re[2]: bayesian filter training

Hello Peter,

Friday, February 11, 2005, 4:17:33 AM, you wrote:

PM> but would that not mean that the bayes filter will learn the headers
PM> that spam assassin adds as spam .. and then after a while only start
PM> classing mail that already has the spam headers as bayes_99 ?

No, since the sa-learn process knows to ignore the SpamAssassin
headers.

I feed EVERYTHING into Bayes, once it's been manually
verified/classified.

Bob Menschel




PM> I really do not know, I am just asking.

PM> Peter

PM> Matt Kettler wrote:
>> At 05:06 PM 2/10/2005, Matias Lopez Bergero wrote:
>> 
>>> Just a question,
>>> It is worth to train the bayes filter with messages already detected
>>> and flagged as spam by spamassassin? That would do any good?
>> 
>> 
>> Yes. And even if they are already flagged as BAYES_99 it is still 
>> worthwhile.
>> 
>> 
>> The reason why is that bayes does not learn that a message is spam or
>> not. Bayes learns that a given set of words and tokens were seen in
>> spam. A given spam message might be scored as spam and might already
>> score high on the bayes scale, but it can still contain valuable new
>> words to learn from. In particular the constant mutations of ways of
>> spelling drug names provides a constant stream of fresh new spam 
>> indicators to for bayes learn about. Learning about these helps it 
>> identify future spam messages that might not otherwise look very 
>> spam-like, and offers you some protection from false negatives caused by
>> spam mutations.
>> 
>> 
>> The only time it's not worthwhile is if the message was already learned
>> as spam (ie: by the autolearner).. but in that case SA will just ignore
>> you. You're wasting some cpu time, but you won't damage or corrupt 
>> anything.