You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by omehegan <ow...@nerdnetworks.org> on 2007/07/05 20:17:04 UTC

Bayes suddenly scoring everything at 0

I'm running SA 3.2.1 with Postfix, routing mail to it through spamd/spamc. I
have a site-wide Bayesian database that I trained some time ago with a few
hundred hams, and then since then I've trained spam into it anytime I
received a false negative. With the recent influx of PDF and stock spam,
I've been updating rules and tweaking settings to get SA to catch them. I
noticed something interesting - all the spam I've gotten in at least the
last few days has scored 0 on Bayes. That's causing SA to drop the message's
score by 2.6 points, throwing other filters off-balance, so to speak. I'm
wondering if this is happening because I've been dutifully teaching these
stock spam messages into the database. They're full of nonsense words, and
although I think I've been told on this list that it's ok to submit them, it
seems like that could reduce the Bayes reliability. Or, maybe I just need to
refresh the database with a slew of new ham messages. 

Attached is a spam I got today, which got good hits in other tests but 0
probability in Bayes. Any suggestions on how to remedy this would be
appreciated. Thanks!

http://www.nabble.com/file/p11451717/spam.txt spam.txt 
-- 
View this message in context: http://www.nabble.com/Bayes-suddenly-scoring-everything-at-0-tf4031385.html#a11451717
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Bayes suddenly scoring everything at 0

Posted by Bob Proulx <bo...@proulx.com>.
omehegan wrote:
> Any other thoughts on this? I got another 5-6 spams this morning that were
> scored 0 by Bayes. It's dragging down the hits from other rules!

I have seen this too when the volume of spam to learn from is much
larger than the volume of non-spam and the total quantity of learned
messages is very large.  When very large training-on-error has little
effect because it would take many errors to correct.  When unbalanced
with more spam than ham it is overwhelmed by the "badness" of the
universe and does not have enough "goodness" to offset it.

When I have been in those situations I eventually gave up and reset
the Bayes database and started over.  When I have between 1,000
messages and 100,000 messages learned the system is always quite
good.  It is only when the number of messages become much larger that
I see those types of problems.

Bob


Re: Bayes suddenly scoring everything at 0

Posted by omehegan <ow...@nerdnetworks.org>.
Any other thoughts on this? I got another 5-6 spams this morning that were
scored 0 by Bayes. It's dragging down the hits from other rules!


omehegan wrote:
> 
> I'm running SA 3.2.1 with Postfix, routing mail to it through spamd/spamc.
> I have a site-wide Bayesian database that I trained some time ago with a
> few hundred hams, and then since then I've trained spam into it anytime I
> received a false negative. With the recent influx of PDF and stock spam,
> I've been updating rules and tweaking settings to get SA to catch them. I
> noticed something interesting - all the spam I've gotten in at least the
> last few days has scored 0 on Bayes. That's causing SA to drop the
> message's score by 2.6 points, throwing other filters off-balance, so to
> speak. I'm wondering if this is happening because I've been dutifully
> teaching these stock spam messages into the database. They're full of
> nonsense words, and although I think I've been told on this list that it's
> ok to submit them, it seems like that could reduce the Bayes reliability.
> Or, maybe I just need to refresh the database with a slew of new ham
> messages. 
> 
> Attached is a spam I got today, which got good hits in other tests but 0
> probability in Bayes. Any suggestions on how to remedy this would be
> appreciated. Thanks!
> 
>  http://www.nabble.com/file/p11451717/spam.txt spam.txt 
> 

-- 
View this message in context: http://www.nabble.com/Bayes-suddenly-scoring-everything-at-0-tf4031385.html#a11472459
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Bayes suddenly scoring everything at 0

Posted by omehegan <ow...@nerdnetworks.org>.
I should note that autolearn is turned on, and is apparently learning about
half of my legit messages as ham, so that's cool. Furthermore, the spams
that are getting through are showing as autolearn=no, so that's good as
well. Seems less likely, then, that a stale database of ham messages is
causing my problem.
-- 
View this message in context: http://www.nabble.com/Bayes-suddenly-scoring-everything-at-0-tf4031385.html#a11452605
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Bayes suddenly scoring everything at 0

Posted by Alex Woick <al...@wombaz.de>.
> I have a site-wide Bayesian database that I trained some time ago with a few
> hundred hams, and then since then I've trained spam into it anytime I
> received a false negative.
[...]
> I noticed something interesting - all the spam I've gotten in at least the
> last few days has scored 0 on Bayes.

I am continuously learning everything for Bayes. I have autolearn on 
(it's default) and am explicitly learning all unlearned ham and spam 
accordingly, included FPs. But that's only my account. The other users 
don't let learn their mail, so for them only autolearn applies.

Almost all spams that are half "content" and half random text score 
BAYES_99, so I think that's the way to do it. Whenever I look at the 
spam scores, I see BAYES_99 in spam and BAYES_50 or lower on ham.
It's important to continuously learn everything so the system 
accommodates to new mail characteristics. No mail is more "important" to 
learn than others. Every mail is equally important.

To help Bayes distinguish between spam and ham, I have subscribed to a 
few technical medium-traffic spam free mailing lists, even if I don't 
read them regularly. Otherwise, the ham count is a bit too low in my 
opinion.