You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/11/18 18:55:01 UTC

Re: collaborative bayes bases

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Kevin W. Gagel" writes:
> >in wiki://BayesInSpamAssassin it is said:
> >Do not train Bayes on different mail streams or public spam
> >corpora. These method will mislead Bayes into believing
> >certain tokens are spammy or hammy when they are not.
> >
> >Could you explain why it is so, and what could happen if to
> >teach nayes from several mail servers ?
> 
> The idea in training bayes is to train it for your server.
> Using someone else's mail to train it results in a bayes
> server trained for their email.
> 
> Their email may or may not resemble what you consider as
> spam or ham. That is what the problem is.

Yes.  Also, another problem is that if you exclusively use one class of
mail from that server, e.g. all the mail collected from that server is
spam, then what your training will do is train SpamAssassin to recognise
all mail from that server as spam.

In reality, often there are other types of mail coming from that server,
as well as spam -- but unless you train with those mails, SpamAssassin
won't know that.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFDfhV0MJF5cimLx9ARAjkSAJ0WieDVB1sPy7KnWbXJUppZTrBnkgCgjYjc
GPQZTG45xQIzvkxxP6eL1/o=
=sI+S
-----END PGP SIGNATURE-----