You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jean-Paul Natola <jn...@familycareintl.org> on 2008/03/28 20:45:42 UTC
-2.6 bayes_00
Why does this hit on the most OBVIOUS messages?
Its almost an oxymoron
How can all these rules get triggered
0.6 J_CHICKENPOX_34 BODY: 3alpha-pock-4alpha
0.6 J_CHICKENPOX_64 BODY: 6alpha-pock-4alpha
0.6 J_CHICKENPOX_82 BODY: 8alpha-pock-2alpha
-2.6 BAYES_00 BODY: Bayesian spam probability is 0 to
1%
[score: 0.0000]
1.4 ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian
419)
1.7 SARE_FRAUD_X3 Matches 3+ phrases commonly used in fraud
spam
1.7 SARE_FRAUD_X4 Matches 4+ phrases commonly used in fraud
spam
0.1 TO_CC_NONE No To: or Cc: header
And in the midst come up with bayes_00
j
Re: -2.6 bayes_00
Posted by Matt Kettler <mk...@verizon.net>.
Jean-Paul Natola wrote:
> i have site-wide config, as I only filter the mail and pass it on to
> exchange- no ind users setup
>
Ok, so you're using a bayes_path and bayes_file_mode in your config?
Or are you always force-running SA as one non-root user, and su'ing to
that user for your training?
>
>
> I run sa-learn --spam --showdots
Seems reasonable. As long as one of the above is true, and you're
feeding it real messages ie: not forwarded or otherwise mangled by
exchange. You need the full, raw message with complete original mail
headers.
You might want to try running it through spamassassin on the command
line and make sure it matches BAYES_00 there too. If it matches BAYES_99
on the command line, but BAYES_00 at delivery time, there's something
that isn't matching up between your training and the inbound email.
RE: -2.6 bayes_00
Posted by Jean-Paul Natola <jn...@familycareintl.org>.
i have site-wide config, as I only filter the mail and pass it on to
exchange- no ind users setup
I run sa-learn --spam --showdots
J
________________________________
From: Matt Kettler [mailto:mkettler_sa@verizon.net]
Sent: Sun 3/30/2008 21:11
To: Jean-Paul Natola
Cc: users@spamassassin.apache.org
Subject: Re: -2.6 bayes_00
Jean-Paul Natola wrote:
> i've trained SA with about 12000 messages that have made it through the
> filters , i last trained 1 week ago
>
Any chance you're training a different database than SA uses at delivery
time?
Re: -2.6 bayes_00
Posted by Matt Kettler <mk...@verizon.net>.
Jean-Paul Natola wrote:
> i've trained SA with about 12000 messages that have made it through the
> filters , i last trained 1 week ago
>
Any chance you're training a different database than SA uses at delivery
time?
RE: -2.6 bayes_00
Posted by Jean-Paul Natola <jn...@familycareintl.org>.
i've trained SA with about 12000 messages that have made it through the
filters , i last trained 1 week ago
Jean-Paul
________________________________
From: Matt Kettler [mailto:mkettler_sa@verizon.net]
Sent: Sun 3/30/2008 19:22
To: Jean-Paul Natola
Cc: users@spamassassin.apache.org
Subject: Re: -2.6 bayes_00
Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS messages?
>
> Its almost an oxymoron
>
Well, it's your responsibility to train your bayes database. It's
hitting BAYES_00 because it closely matches your nonspam training.
You can start correcting it by using sa-learn --spam on some of the
misclassified messages. Or if this is a pervasive problem, you might
want to wipe out your bayes database completely with sa-learn --clear
and start from scratch.
Generally you can find out what tokens a particular message is using by
feeding it to spamassassin with bayes debugging enabled.
spamassassin -D bayes < message.txt
However, this doesn't seem to be working on my test box (probably my own
fault, I've been tinkering a bit much lately)..
Re: -2.6 bayes_00
Posted by Matt Kettler <mk...@verizon.net>.
Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS messages?
>
> Its almost an oxymoron
>
Well, it's your responsibility to train your bayes database. It's
hitting BAYES_00 because it closely matches your nonspam training.
You can start correcting it by using sa-learn --spam on some of the
misclassified messages. Or if this is a pervasive problem, you might
want to wipe out your bayes database completely with sa-learn --clear
and start from scratch.
Generally you can find out what tokens a particular message is using by
feeding it to spamassassin with bayes debugging enabled.
spamassassin -D bayes < message.txt
However, this doesn't seem to be working on my test box (probably my own
fault, I've been tinkering a bit much lately)..
Re: -2.6 bayes_00
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 28.03.08 15:45, Jean-Paul Natola wrote:
> Why does this hit on the most OBVIOUS messages?
what's obvious? the score may indicate FP, as long as FN
> Its almost an oxymoron
>
> How can all these rules get triggered
quite easy. *chickenpox* often hit non-english
BAYES must be trained, otherwise it might start hitting _00 because of new
spam phrases appear and old disappear
> 0.6 J_CHICKENPOX_34 BODY: 3alpha-pock-4alpha
> 0.6 J_CHICKENPOX_64 BODY: 6alpha-pock-4alpha
> 0.6 J_CHICKENPOX_82 BODY: 8alpha-pock-2alpha
> -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to
> 1%
> [score: 0.0000]
> 1.4 ADVANCE_FEE_2 Appears to be advance fee fraud (Nigerian
> 419)
> 1.7 SARE_FRAUD_X3 Matches 3+ phrases commonly used in fraud
> spam
> 1.7 SARE_FRAUD_X4 Matches 4+ phrases commonly used in fraud
> spam
> 0.1 TO_CC_NONE No To: or Cc: header
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody