You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steve Sobel <st...@openingbands.com> on 2004/11/26 20:31:38 UTC

bayes_xx rules - stupid newbie question

[off topic from the rest of my post:  wow spamd uses a lot of memory! I
limited it to 5 processes because each one is 22-26 megs!]

Okay, I can't seem to find anything on the bayes_xx rules (bayes_20,
bayes_50, etc) via google.  My apologies but I cannot find a reasonable
"FM" to read, basically.

I'm using 3.01 now, and on the plus side of things, 100% of the mail that
SA is marking as spam is spam - NO false positives thus far.  Hooray for
that.

On the minus side, no matter how many times I send some messages to my
"Learn Spam" folder (where it's processed and emptied nightly), certain
messages I get many times a day still are not marked as spam.  Mostly
rolex watch spams, but there are others as well.

On all of these messages, I've noticed rules like BAYES_00, BAYES_20,
etc., which I'm assuming are "score droppers" that reduce the spam score
of an email.

How can I find out what triggers these rules, and stop it from happening
on these emails?  Where is the bayes database even stored by default? (I
certainly haven't changed it, so it should be there).

I'm sure this is an elementary retard question, but I swear to you helpful
readers I've googled and can't find squat.  Any help would be appreciated.
 Thanks.

Steve

Re: bayes_xx rules - stupid newbie question

Posted by Kai Schaetzl <ma...@conactive.com>.
Steve Sobel wrote on Fri, 26 Nov 2004 13:31:38 -0600 (CST):

> How can I find out what triggers these rules, and stop it from happening 
> on these emails?
>

These are Bayes scores, go to the documentation and read about Bayes. 
Bayes_10 means that message was considered spam with a probability between 
roughly 5 and 15% percent (I think, it gets rounded). 50% means the 
probability is 50:50, so no decision can be made. Something approaching 0% 
is likely ham and something approaching 100% (Bayes_99) is likely spam. 
This is all based on statistical algorithms which keep track of text tokens 
they rip out of messages, aggregate them and then compare incoming messages 
against that collection of tokens. It's commonly referred to as "Bayes" 
because the original algorithms were thought out by someone called Bayes 
(not for spam, but for statistical analysis).


Kai

-- 

Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org




Re: bayes_xx rules - stupid newbie question

Posted by Jon Drukman <js...@cluttered.com>.
Steve Sobel wrote:
> On the minus side, no matter how many times I send some messages to my
> "Learn Spam" folder (where it's processed and emptied nightly), certain
> messages I get many times a day still are not marked as spam.  Mostly
> rolex watch spams, but there are others as well.

have you trained at least 200 spam and 200 ham?  until you hit that 
point, spamassassin only operates in "rules" mode.

> On all of these messages, I've noticed rules like BAYES_00, BAYES_20,
> etc., which I'm assuming are "score droppers" that reduce the spam score
> of an email.

right, if you mark a message as ham, sa thinks that future messages 
which contain similar words are more likely to be good.

> How can I find out what triggers these rules, and stop it from happening
> on these emails?  Where is the bayes database even stored by default? (I
> certainly haven't changed it, so it should be there).

the bayes db is in $HOME/.spamassassin/bayes_* by default.  you can 
remove those files (if you want to start training from scratch) or use 
sa-learn to manipulate them.

-jsd-