You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Case <co...@arlyle.com> on 2008/07/04 02:00:13 UTC

Why are BAYES_00 to BAYES_40 scores negative?

I'm going to ask a really silly question...

First, my particulars:
Fedora Core 8 x86_64
Qmail 1.03 (Running a Modified QmailRocks configuration, which is everything
except vpopmail)
Qscan
ClamAV
SpamAssassin 3.2.4

I periodically audit messages that get through SpamAssassin to see why they
didn't reach the score threshold (mine is set at 3.5).  I compare the
messages with the scoring details that get logged in "maillog".

I noticed that in many of the messages that got through were hitting the
BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores
for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When
you get to BAYES_50 and higher, the scores turn positive.  Also, in many
instances, the negative BAYES_* scores made the difference between reaching
the threshold and not.

My question is WHY are those rules negative?

I went ahead and assigned a positive score for those rules (ranging from
0.001 to 0.040), but I figured I had better ask here why those scores are
negative.  I'm figuring there's a good reason, and I don't want to shoot
myself in the foot.

Thanks,

Robert...
-- 
View this message in context: http://www.nabble.com/Why-are-BAYES_00-to-BAYES_40-scores-negative--tp18270341p18270341.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Why are BAYES_00 to BAYES_40 scores negative?

Posted by Matt Kettler <mk...@verizon.net>.
Robert Case wrote:
> I'm going to ask a really silly question...
>
> First, my particulars:
> Fedora Core 8 x86_64
> Qmail 1.03 (Running a Modified QmailRocks configuration, which is everything
> except vpopmail)
> Qscan
> ClamAV
> SpamAssassin 3.2.4
>
> I periodically audit messages that get through SpamAssassin to see why they
> didn't reach the score threshold (mine is set at 3.5).  I compare the
> messages with the scoring details that get logged in "maillog".
>
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules. 
>   
As several have pointed out already, BAYES_00 is a very strong 
indication the message matches your non-spam training. Anything under 50 
would indicate it is more likely to be not spam, and the lower the 
number, the more likely it is to be nonspam. (in general the two numbers 
are the percent chance the message is spam. 00 means 0% chance it's 
spam, therefore 100% chance it's not, 40 would mean 40% chance of spam, 
and therefore 60% chance it's not.)

If you've got a significant amount of spam matching low-scoring bayes 
rules, you should re-examine your bayes training.


Re: Why are BAYES_00 to BAYES_40 scores negative?

Posted by Sahil Tandon <sa...@tandon.net>.
Robert Case <co...@arlyle.com> wrote:

> I periodically audit messages that get through SpamAssassin to see why they
> didn't reach the score threshold (mine is set at 3.5).  I compare the
> messages with the scoring details that get logged in "maillog".
> 
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores
> for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When
> you get to BAYES_50 and higher, the scores turn positive.  Also, in many
> instances, the negative BAYES_* scores made the difference between reaching
> the threshold and not.
> 
> My question is WHY are those rules negative?

Because bayesian rules are not only supposed to stop spam, but also help ham 
get through your filter.  Your bayesian database thinks those spammy mails 
have hammy attributes.  You can try sa-learning those emails so SA will 
eventually start assigning a positive score to similar emails in the future.

[...]

-- 
Sahil Tandon <sa...@tandon.net>

Re: Why are BAYES_00 to BAYES_40 scores negative?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jul 03, 2008 at 05:00:13PM -0700, Robert Case wrote:
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores
> for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When
> you get to BAYES_50 and higher, the scores turn positive.  Also, in many
> instances, the negative BAYES_* scores made the difference between reaching
> the threshold and not.

Yep.

> My question is WHY are those rules negative?

Because they are ham detection rules.

> I went ahead and assigned a positive score for those rules (ranging from
> 0.001 to 0.040), but I figured I had better ask here why those scores are
> negative.  I'm figuring there's a good reason, and I don't want to shoot
> myself in the foot.

Bayes provides a probability of a message being spam.  Therefore: 50% is "not
sure either way", 0% is "not spam", 100% is "definitely spam".

-- 
Randomly Selected Tagline:
"lp1 on fire" - Linux kernel error message