You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/09/07 23:15:35 UTC

Re: Bayes scoring weirdness?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Theo Van Dinter writes:
> On Tue, Sep 07, 2004 at 02:32:42PM -0600, Chris Blaise wrote:
> > 	The rules were ALL_TRUSTED,MISSING_DATE,USER_IN_BLACKLIST and I
> > think since "ALL_TRUSTED" is a negative value.
> > 
> > 	Am I missing something about how auto-learn should consider this?  
> > 
> > 	Is there a reason why it doesn't consider such a message as spam,
> > soley on the score?  I realize that it won't learn spam if the header and
> > body aren't at least 3 each, but for such a high score, it seems like it
> > should be able to disregard that to say, "This is huge; learn it as spam."
> 
> This has been convered many times.  It's probably in the wiki, and definitely
> in the documentation:
> 
> [...]
>        bayes_auto_learn ( 0 | 1 )      (default: 1)
> [...]
>            Note that certain tests are ignored when determining whether a mes-
>            sage should be trained upon:
> 
>             - rules with tflags set to 'learn' (the Bayesian rules)
> 
>             - rules with tflags set to 'userconf' (user white/black-listing
>               rules, etc)
> 
>             - rules with tflags set to 'noautolearn'
> 
>            Also note that auto-training occurs using scores from either score-
>            set 0 or 1, depending on what scoreset is used during message
>            check.  It is likely that the message check and auto-train scores
>            will be different.
> 
> As always though, run with -D and you'll find out plenty. ;)

BTW the idea of USER_IN_BLACKLIST being ignored for bayes is so
that if a user screws up and accidentally BLs a ham source, it
won't pollute Bayes as well.

I think in 3.0.0 we've added more logic so that it won't be learned
*at all* in that situation -- not as ham or spam.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBPiT2QTcbUG5Y7woRAhR6AKC2yiFRhAWioYJCoyBH06vwEY5rfgCfdhzA
BNxtJ4X61mlXo4vvjyWlDwE=
=d/QW
-----END PGP SIGNATURE-----


Re: Bayes scoring weirdness?

Posted by Matt Kettler <mk...@evi-inc.com>.
At 05:15 PM 9/7/2004, Justin Mason wrote:
>BTW the idea of USER_IN_BLACKLIST being ignored for bayes is so
>that if a user screws up and accidentally BLs a ham source, it
>won't pollute Bayes as well.
>
>I think in 3.0.0 we've added more logic so that it won't be learned
>*at all* in that situation -- not as ham or spam.

Justin, by the virtue of the ALL_TRUSTED test being matched, one can infer 
that the user is using some form of 3.0 prerelease code, although not 
necessarily which release.

Also, looking at SA 3.0 RC3's PerMsgStatus.pm I don't see code to prevent 
learning of an email if it's blacklisted.

It's considerably changed from 3.0-pre4, but it does neither appear to have 
any added criteria to not autolearn black/whitelisted emails.