You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Loren Wilton <lw...@earthlink.net> on 2005/07/29 11:29:56 UTC

Thoughts/ramblings on rule short circuiting

I was thinking about the 'best' wat to shortcut running rules when they
weren't needed, and suddenly realized there might be cases where it is
necessary to run them even though they won't determine the hammyness or
spammyness of the mail.

In particular, I'm wondering about bayes and awl auto-learning.  I'm also
wondering about people that rank mail in some way by SA score.

Consider for argument a system with one -100 rule and a whole bunch of rules
that will score +100 total, bayes that will score a max of 4, and no awl.

It seems obvious that we want to run that -100 rule first.  If it hits, the
maximum possible score if *every* other rule hits will be 4, and with a
threshold of 5, the mail can't be spam.  So we can stop after the -100 rule
hits, and only run one rule on this mail.

But that will end up with the mail scored -100.  That certainly makes it
look like dyed in the wool ham.

But what if, for argument, every other rule HAD hit?  The end score would be
4.  That makes the mail look rather questionable.  Especially if one looks
at the pages and pages of rule hits and realizes the score is really 104,
but was offset by the whitelist rule.

As far as the SA classification is concerned, the -100 score is sufficient.
But is it sufficient for humans?

Another thought: bayes is running and wants to auto-learn this thing.  Will
it auto-learn differently if the message has a score of -100 than if it had
a score of 4?  I assume it probably will.
Does this mean we can't shortcut rules if bayes autolearning is enabled?

Likewise for awl.  I've always had this turned off and paid no attention to
it, but I assume that the way it learns whatever it learns is based on the
final score.  Does this mean that you can't shortcut with awl enabled?

Thoughts and comments most welcome!

        Loren


Re: Thoughts/ramblings on rule short circuiting

Posted by John Madden <ma...@skynet.ie>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On (29/07/05 02:29), Loren Wilton didst pronounce:
> As far as the SA classification is concerned, the -100 score is sufficient.
> But is it sufficient for humans?
> 
If you're considering a whitelist-from rule as the -100, then I think it
should be sufficient to cause the message to be shortcut. After all, the 
human user of SA has decided that all mail from person@somewhere should 
never be marked as spam, and it is my understanding that the score of 
- -100 was given to this so as to ensure it's not caught by anything else. 
Essentially, IMO, this mail doesn't even need to be scored, just shortcut, 
given X-Spam-* headers and let through. Similarly the blacklist-from 
rules should be shortcut but would need a score in this case (since some 
people filter on score, not on X-Spam-Status).

There are also other possibilities for shortcutting - eg. Bayes_99,
but whether these should be dealt with in the same way (since these were
machine learned, and not really human stated) I don't know.

- -- 
Chat ya later,

John.
- --
BOFH excuse #1: clock speed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFC6gUXQBw+ZtKOvTIRAlyHAJ9N28uf6YNZBfrD44CfafaVxBCJGgCfQSCh
VZDj/juMorELwyjCw+yArog=
=KEuB
-----END PGP SIGNATURE-----