You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt Kettler <mk...@verizon.net> on 2009/09/22 13:13:17 UTC

Re: partial (lazy) scoring? (shortcircuit features)

ArtemGr wrote:
> I would like to configure Spamassassin to only do certain tests
> when the "required_score" is not yet reached.
> For example, do the usual rule-based and bayesian tests first,
> and if the score is lower than the "required_score",
> then do the DCC and RAZOR2 tests.
>
> Is it possible?
>
>
>   
Not exactly the way you describe, no.

SpamAssassin has a priority and a shortcircuit facility that provide a
vaguely similar functionality, but it doesn't really work exactly the
way you want.

Priority allows you to change the order in which rules are processed, so
you can make some rules run earlier, or later, than others. This part
fits your needs.

Shortcircuit allows you to stop processing when a particular rule fires.
However, it is strictly based on the rule firing, not the message score.
This part doesn't fit your needs.

Collectively they allow you to make some rules (ie: USER_IN_WHITELIST,
USER_IN_BLACKLIST) run first, and abort processing if they fire.

However, this doesn't really work for your scenario of delaying a few
rules and aborting if they're not needed.

I suppose there could be some kind of mod to the shortcircuit plugin to
do this, however it's a little dangerous from a false-positive
perspective, so the devs may not be very enthusiastic about adding it.

A long, long time ago, SpamAssassin had a feature where it would abort
as soon as a given score was hit. However, this introduced a problem
where it could cause false positives. A nonspam message might hit
several spam rules early in the processing, and drive the score over the
abort threshold, causing it to be tagged as spam. However, this could
prevent it from matching negative scoring rules that would push it back
under the spam threshold.

Now, that version of SA was a long time ago, and we didn't have any
priority going on, and it was also checking the score pretty often in
between rules.

In theory, a feature could be added to let you do something like this
(SA doesn't have this feature, but I'm proposing it could be added):

shortcircuit_if_score_above_at <score> <priority>

Which would let you do:

shortcircuit_if_score_above_at 5.0 999999
priority RAZOR_CHECK 1000000
priority DCC_CHECK 1000000

You'd have to be careful about your priorities, as this will prevent any
nonspam rules with higher priority numbers from running, but it could
work for this scenario.

You could also prevent the rules from running on nonspam if they're
pointless as well with a similar "score below" feature:

shortcircuit_if_score_below_at -1.17 999999

The highest score you can ever get out of both DCC and Razor (with the
current scores) is +6.17 (unlikely, but possible, assuming both e4 and
e8 have high cf's and DCC fires too). If the score is already below
-1.17, there's no way these rules can ever drive the score up enough be
over 5.0 and make the message spam.

Obviously this would greatly depend on what rules you're running late.




Re: partial (lazy) scoring? (shortcircuit features)

Posted by Matt Kettler <mk...@verizon.net>.
Matus UHLAR - fantomas wrote:
>> Matt Kettler <mkettler_sa <at> verizon.net> writes:
>>     
>>> In theory, a feature could be added to let you do something like this
>>> (SA doesn't have this feature, but I'm proposing it could be added):
>>>       
>
> On 22.09.09 11:46, ArtemGr wrote:
>   
>> That would be a nice optimization: most of the spam we receive have a >10
>> score. It seems a real waste of resource to perform all the complex tests
>> (like distributed hashing or OCR-ing) on spam which is DNS and
>> rule-detectable.
>>     
>
> You haven't read Matt's explanation of why it wasn't a good idea, did you?
>
> There are rules with negative scores, which can puch the score back to the
> ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
> checked?
>   
*You* obviously haven't read my message, which explains how this *can*
be done safely.



Re: partial (lazy) scoring? (shortcircuit features)

Posted by ArtemGr <ar...@gmail.com>.
Matus UHLAR - fantomas <uhlar <at> fantomas.sk> writes:
> You haven't read Matt's explanation of why it wasn't a good idea, did you?
> 
> There are rules with negative scores, which can puch the score back to the
> ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
> checked?

I am not going to postpone the execution of the negative score rules.


Re: partial (lazy) scoring? (shortcircuit features)

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Matt Kettler <mkettler_sa <at> verizon.net> writes:
> > In theory, a feature could be added to let you do something like this
> > (SA doesn't have this feature, but I'm proposing it could be added):

On 22.09.09 11:46, ArtemGr wrote:
> That would be a nice optimization: most of the spam we receive have a >10
> score. It seems a real waste of resource to perform all the complex tests
> (like distributed hashing or OCR-ing) on spam which is DNS and
> rule-detectable.

You haven't read Matt's explanation of why it wasn't a good idea, did you?

There are rules with negative scores, which can puch the score back to the
ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
checked?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Micro$oft random number generator: 0, 0, 0, 4.33e+67, 0, 0, 0...

Re: partial (lazy) scoring? (shortcircuit features)

Posted by ArtemGr <ar...@gmail.com>.
Matt Kettler <mkettler_sa <at> verizon.net> writes:
> In theory, a feature could be added to let you do something like this
> (SA doesn't have this feature, but I'm proposing it could be added):

That would be a nice optimization: most of the spam we receive have a >10 score.
It seems a real waste of resource to perform all the complex tests (like
distributed hashing or OCR-ing) on spam which is DNS and rule-detectable.