You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Jean-Paul Natola <jn...@familycareintl.org> on 2008/03/28 20:45:42 UTC

-2.6 bayes_00

Why does this hit on the most OBVIOUS  messages?

Its almost an oxymoron

How can all these rules get triggered

0.6 J_CHICKENPOX_34        BODY: 3alpha-pock-4alpha
	0.6 J_CHICKENPOX_64        BODY: 6alpha-pock-4alpha
	0.6 J_CHICKENPOX_82        BODY: 8alpha-pock-2alpha
	-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to
1%
	[score: 0.0000]
	1.4 ADVANCE_FEE_2          Appears to be advance fee fraud (Nigerian
419)
	1.7 SARE_FRAUD_X3          Matches 3+ phrases commonly used in fraud
spam
	1.7 SARE_FRAUD_X4          Matches 4+ phrases commonly used in fraud
spam
	0.1 TO_CC_NONE             No To: or Cc: header


And in the midst come up with bayes_00






j

Re: -2.6 bayes_00

Posted by Matt Kettler <mk...@verizon.net>.

Jean-Paul Natola wrote:
> i have site-wide config, as I only filter the mail and pass it on to
> exchange- no ind users setup 
>   
Ok, so you're using a bayes_path and bayes_file_mode in your config?

Or are you always force-running SA as one non-root user, and su'ing to 
that user for your training?

>  
>  
> I run  sa-learn --spam --showdots
Seems reasonable. As long as one of the above is true, and you're 
feeding it real messages ie: not forwarded or otherwise mangled by 
exchange. You need the full, raw message with complete original mail 
headers.

You might want to try running it through spamassassin on the command 
line and make sure it matches BAYES_00 there too. If it matches BAYES_99 
on the command line, but BAYES_00 at delivery time, there's something 
that isn't matching up between your training and the inbound email.

RE: -2.6 bayes_00

Posted by Jean-Paul Natola <jn...@familycareintl.org>.

i have site-wide config, as I only filter the mail and pass it on to
exchange- no ind users setup 
 
 
I run  sa-learn --spam --showdots
 
 
 
 
 
 
 
J

________________________________

From: Matt Kettler [mailto:mkettler_sa@verizon.net]
Sent: Sun 3/30/2008 21:11
To: Jean-Paul Natola
Cc: users@spamassassin.apache.org
Subject: Re: -2.6 bayes_00



Jean-Paul Natola wrote:
> i've trained SA  with about 12000 messages  that have made it through the
> filters , i last trained  1 week ago
> 

Any chance you're training a different database than SA uses at delivery
time?

Re: -2.6 bayes_00

Posted by Matt Kettler <mk...@verizon.net>.

Jean-Paul Natola wrote:
> i've trained SA  with about 12000 messages  that have made it through the
> filters , i last trained  1 week ago
>  

Any chance you're training a different database than SA uses at delivery 
time?

RE: -2.6 bayes_00

Posted by Jean-Paul Natola <jn...@familycareintl.org>.

i've trained SA  with about 12000 messages  that have made it through the
filters , i last trained  1 week ago

Jean-Paul 

________________________________

From: Matt Kettler [mailto:mkettler_sa@verizon.net]
Sent: Sun 3/30/2008 19:22
To: Jean-Paul Natola
Cc: users@spamassassin.apache.org
Subject: Re: -2.6 bayes_00

Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS  messages?
>
> Its almost an oxymoron
>  

Well, it's your responsibility to train your bayes database. It's
hitting BAYES_00 because it closely matches your nonspam training.

You can start correcting it by using sa-learn --spam on some of the
misclassified messages. Or if this is a pervasive problem, you might
want to wipe out your bayes database completely with sa-learn --clear
and start from scratch.

Generally you can find out what tokens a particular message is using by
feeding it to spamassassin with bayes debugging enabled.

spamassassin -D bayes < message.txt

However, this doesn't seem to be working on my test box (probably my own
fault, I've been tinkering a bit much lately)..

Re: -2.6 bayes_00

Posted by Matt Kettler <mk...@verizon.net>.

Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS  messages?
>
> Its almost an oxymoron
>   

Well, it's your responsibility to train your bayes database. It's 
hitting BAYES_00 because it closely matches your nonspam training.

You can start correcting it by using sa-learn --spam on some of the 
misclassified messages. Or if this is a pervasive problem, you might 
want to wipe out your bayes database completely with sa-learn --clear 
and start from scratch.

Generally you can find out what tokens a particular message is using by 
feeding it to spamassassin with bayes debugging enabled.

spamassassin -D bayes < message.txt

However, this doesn't seem to be working on my test box (probably my own 
fault, I've been tinkering a bit much lately)..

Re: -2.6 bayes_00

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 28.03.08 15:45, Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS  messages?

what's obvious? the score may indicate FP, as long as FN

> Its almost an oxymoron
> 
> How can all these rules get triggered

quite easy. *chickenpox* often hit non-english 
BAYES must be trained, otherwise it might start hitting _00 because of new
spam phrases appear and old disappear
 
> 0.6 J_CHICKENPOX_34        BODY: 3alpha-pock-4alpha
> 	0.6 J_CHICKENPOX_64        BODY: 6alpha-pock-4alpha
> 	0.6 J_CHICKENPOX_82        BODY: 8alpha-pock-2alpha
> 	-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to
> 1%
> 	[score: 0.0000]
> 	1.4 ADVANCE_FEE_2          Appears to be advance fee fraud (Nigerian
> 419)
> 	1.7 SARE_FRAUD_X3          Matches 3+ phrases commonly used in fraud
> spam
> 	1.7 SARE_FRAUD_X4          Matches 4+ phrases commonly used in fraud
> spam
> 	0.1 TO_CC_NONE             No To: or Cc: header
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody