You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "David F. Skoll" <df...@roaringpenguin.com> on 2014/07/14 21:31:48 UTC

Individual Bayes (Re: I need professional help)

On Mon, 14 Jul 2014 13:24:10 -0600
Bob Proulx <bo...@proulx.com> wrote:

> And since this appears to be at the global MTA stage in a milter
> that it will always be less effective globally than an
> individualized Bayes database.

Not necessarily.  We have a giant Bayes database based on feedback
from our customers (it has tokens from about 3.6 million each of ham
and spam) and it gave a 99% likelihood of spam when I fed it
http://pastebin.com/Feete78K

The key is to have a rich corpus of hand-trained mail for Bayes.
Having individualized Bayes databases is much less important than most
people think; in our experience, most people agree on what's ham
vs. what's spam.  The real win for individualized Bayes databases
comes from people working in specialized fields where the jargon
associated with the field is a strong ham indicator.

In other words, individualized Bayes databases help quite
a bit to detect ham, but don't help that much to detect
spam.

Regards,

David.