You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Paul Boven <p....@chello.nl> on 2005/03/07 12:39:23 UTC

Bayes and msg-ids

Hi everyone,

Could anyone shed some light for me on how and when (and especially why) 
Bayes often generates its own Message-Ids when learning, instead of 
using the one provided in the message? I have a lot of Message-Ids that 
are '@sa-generated' in my Bayes database.
This also makes it a bit hard to check if a message was indeed correctly 
learned, because the real Message-Id never makes it into bayes_seen.

This is the scenario I'm worried about:
1.) A spam (e.g. a stock-spam) goes trough our filter-machine and gets 
falsely autolearned as ham. I've seen this happen quite a few times.

2.) If the recipient happens to be one of our exchange-users, they 
forward it back (as attachement). The mail is stripped out of the 
attachement and fed back into bayes.

If the Message-Id is not the same when the spam-email comes around the 
second time it does get learned as spam, but never gets to correct the 
wrong auto-learning. In theory it would mean you could never get a 
Bayes-probability over 50% for that particular spam... which indeed 
seems to happen for some of the stock-spams of late.

Regards, Paul Boven.

Re: Bayes and msg-ids

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Paul,

Monday, March 7, 2005, 3:39:23 AM, you wrote:

PB> Could anyone shed some light for me on how and when (and especially why)
PB> Bayes often generates its own Message-Ids when learning, instead of
PB> using the one provided in the message? I have a lot of Message-Ids that
PB> are '@sa-generated' in my Bayes database.
PB> This also makes it a bit hard to check if a message was indeed correctly
PB> learned, because the real Message-Id never makes it into bayes_seen.

It's my understanding that this happens when SA could not identify a
Message-id for the email.

Perhaps Bayes should not auto-learn messages without message ids? Is
this already part of the Bayes auto-learn algorithm?

Bob Menschel