You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ArtemGr <ar...@gmail.com> on 2009/09/22 10:14:06 UTC

partial (lazy) scoring?

I would like to configure Spamassassin to only do certain tests
when the "required_score" is not yet reached.
For example, do the usual rule-based and bayesian tests first,
and if the score is lower than the "required_score",
then do the DCC and RAZOR2 tests.

Is it possible?


Re: partial (lazy) scoring? (shortcircuit features)

Posted by ArtemGr <ar...@gmail.com>.
Matus UHLAR - fantomas <uhlar <at> fantomas.sk> writes:
> You haven't read Matt's explanation of why it wasn't a good idea, did you?
> 
> There are rules with negative scores, which can puch the score back to the
> ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
> checked?

I am not going to postpone the execution of the negative score rules.


Re: partial (lazy) scoring? (shortcircuit features)

Posted by Matt Kettler <mk...@verizon.net>.
Matus UHLAR - fantomas wrote:
>> Matt Kettler <mkettler_sa <at> verizon.net> writes:
>>     
>>> In theory, a feature could be added to let you do something like this
>>> (SA doesn't have this feature, but I'm proposing it could be added):
>>>       
>
> On 22.09.09 11:46, ArtemGr wrote:
>   
>> That would be a nice optimization: most of the spam we receive have a >10
>> score. It seems a real waste of resource to perform all the complex tests
>> (like distributed hashing or OCR-ing) on spam which is DNS and
>> rule-detectable.
>>     
>
> You haven't read Matt's explanation of why it wasn't a good idea, did you?
>
> There are rules with negative scores, which can puch the score back to the
> ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
> checked?
>   
*You* obviously haven't read my message, which explains how this *can*
be done safely.



Re: partial (lazy) scoring? (shortcircuit features)

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Matt Kettler <mkettler_sa <at> verizon.net> writes:
> > In theory, a feature could be added to let you do something like this
> > (SA doesn't have this feature, but I'm proposing it could be added):

On 22.09.09 11:46, ArtemGr wrote:
> That would be a nice optimization: most of the spam we receive have a >10
> score. It seems a real waste of resource to perform all the complex tests
> (like distributed hashing or OCR-ing) on spam which is DNS and
> rule-detectable.

You haven't read Matt's explanation of why it wasn't a good idea, did you?

There are rules with negative scores, which can puch the score back to the
ham, e.g. whitelist. Would you like to stop scoring before e.g. whitelist is
checked?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Micro$oft random number generator: 0, 0, 0, 4.33e+67, 0, 0, 0...

Re: partial (lazy) scoring? (shortcircuit features)

Posted by ArtemGr <ar...@gmail.com>.
Matt Kettler <mkettler_sa <at> verizon.net> writes:
> In theory, a feature could be added to let you do something like this
> (SA doesn't have this feature, but I'm proposing it could be added):

That would be a nice optimization: most of the spam we receive have a >10 score.
It seems a real waste of resource to perform all the complex tests (like
distributed hashing or OCR-ing) on spam which is DNS and rule-detectable.


Re: partial (lazy) scoring? (shortcircuit features)

Posted by Matt Kettler <mk...@verizon.net>.
ArtemGr wrote:
> I would like to configure Spamassassin to only do certain tests
> when the "required_score" is not yet reached.
> For example, do the usual rule-based and bayesian tests first,
> and if the score is lower than the "required_score",
> then do the DCC and RAZOR2 tests.
>
> Is it possible?
>
>
>   
Not exactly the way you describe, no.

SpamAssassin has a priority and a shortcircuit facility that provide a
vaguely similar functionality, but it doesn't really work exactly the
way you want.

Priority allows you to change the order in which rules are processed, so
you can make some rules run earlier, or later, than others. This part
fits your needs.

Shortcircuit allows you to stop processing when a particular rule fires.
However, it is strictly based on the rule firing, not the message score.
This part doesn't fit your needs.

Collectively they allow you to make some rules (ie: USER_IN_WHITELIST,
USER_IN_BLACKLIST) run first, and abort processing if they fire.

However, this doesn't really work for your scenario of delaying a few
rules and aborting if they're not needed.

I suppose there could be some kind of mod to the shortcircuit plugin to
do this, however it's a little dangerous from a false-positive
perspective, so the devs may not be very enthusiastic about adding it.

A long, long time ago, SpamAssassin had a feature where it would abort
as soon as a given score was hit. However, this introduced a problem
where it could cause false positives. A nonspam message might hit
several spam rules early in the processing, and drive the score over the
abort threshold, causing it to be tagged as spam. However, this could
prevent it from matching negative scoring rules that would push it back
under the spam threshold.

Now, that version of SA was a long time ago, and we didn't have any
priority going on, and it was also checking the score pretty often in
between rules.

In theory, a feature could be added to let you do something like this
(SA doesn't have this feature, but I'm proposing it could be added):

shortcircuit_if_score_above_at <score> <priority>

Which would let you do:

shortcircuit_if_score_above_at 5.0 999999
priority RAZOR_CHECK 1000000
priority DCC_CHECK 1000000

You'd have to be careful about your priorities, as this will prevent any
nonspam rules with higher priority numbers from running, but it could
work for this scenario.

You could also prevent the rules from running on nonspam if they're
pointless as well with a similar "score below" feature:

shortcircuit_if_score_below_at -1.17 999999

The highest score you can ever get out of both DCC and Razor (with the
current scores) is +6.17 (unlikely, but possible, assuming both e4 and
e8 have high cf's and DCC fires too). If the score is already below
-1.17, there's no way these rules can ever drive the score up enough be
over 5.0 and make the message spam.

Obviously this would greatly depend on what rules you're running late.




Re: partial (lazy) scoring? - run a second time?

Posted by RW <rw...@googlemail.com>.
On Fri, 25 Sep 2009 12:32:35 +0000 (UTC)
ArtemGr <ar...@gmail.com> wrote:

> Benny Pedersen <me <at> junc.org> writes:

> > my former own mailhost have changed from spamassassin to dspam, less
> > work for him and his users, and definaly lees work for his low
> > budget quad xeon intel server with 6Gb ram and alot of
> > disks/diskspace
> 
> Mmm. DSPAM seems to be just another Bayesian filter, no?

It's not simply a standalone filter like SpamAssassin or Bogofilter,
it's an integrated system with quarantine and a web-based user
interface, which is why it appeals to many admins. You can use it as
standalone filter, but they don't go out of their way to make that
intuitive.


> > but i keep away from dspam, it is to unstable for me, but it might
> > just be me as always :)

The original developer sold it to a commercial company who did little
with it, and it got into a bit of a mess. Earlier in the year a new
project was started to maintain it, so hopefully it'll improve.


Re: partial (lazy) scoring? - run a second time?

Posted by ArtemGr <ar...@gmail.com>.
Benny Pedersen <me <at> junc.org> writes:
> fuzzyocr stop scanning if spam score is over a limit, why scan
> ocr when spamassassin can do it without ocr ?

Good to know.

> it could maybe be a option to make spamassassin stop scanning in  
> generic if spam score is high ?
> 
> there is alot of plugins that basicly just does digest match and
> check remote if found elsewhere, only diff is how digest is done
> and for what
> 
> my former own mailhost have changed from spamassassin to dspam, less
> work for him and his users, and definaly lees work for his low budget
> quad xeon intel server with 6Gb ram and alot of disks/diskspace

Mmm. DSPAM seems to be just another Bayesian filter, no?
I'm now using j-chkmail milter for DNSBL and URI DNSBL checks, then CRM114.




Re: partial (lazy) scoring? - run a second time?

Posted by Benny Pedersen <me...@junc.org>.
On tor 24 sep 2009 10:59:35 CEST, ArtemGr wrote
> Do you have measurements, or are you just imagining things?
> OCR-ing all the graphic attachments might be much slower
> than your usual spamassasin run.
> DCC and Pyzor checks might introduce large delays as well.

fuzzyocr stop scanning if spam score is over a limit, why scan
ocr when spamassassin can do it without ocr ?

it could maybe be a option to make spamassassin stop scanning in  
generic if spam score is high ?

there is alot of plugins that basicly just does digest match and
check remote if found elsewhere, only diff is how digest is done
and for what

my former own mailhost have changed from spamassassin to dspam, less
work for him and his users, and definaly lees work for his low budget
quad xeon intel server with 6Gb ram and alot of disks/diskspace

but i keep away from dspam, it is to unstable for me, but it might
just be me as always :)

-- 
xpoint


Re: partial (lazy) scoring? - run a second time?

Posted by ArtemGr <ar...@gmail.com>.
Matus UHLAR - fantomas <uhlar <at> fantomas.sk> writes:
> > That rises the question, whether the basic detections can be turned off.
> > 
> > I found the following options:
> > skip_rbl_checks 1
> > dns_available no
> > use_bayes 0
> > use_bayes_rules 0
> > bayes_auto_learn 0
> > - but I do not see an option
> > to turn off the static rules shipped with spamassasin.
> > 
> > Is there such an option?
> 
> No, unless you will run spamassassin twice, or spamc once and spamassassin
> the other time. Note that running spamassassin would take much more time and
> the most CPU time-consuming rules are those that are not disabled in local
> mode. 
> 
> Why do you want to do that?

Do you have measurements, or are you just imagining things?
OCR-ing all the graphic attachments might be much slower
than your usual spamassasin run.
DCC and Pyzor checks might introduce large delays as well.



Re: partial (lazy) scoring? - run a second time?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> ArtemGr <artemciy <at> gmail.com> writes:
> > I would like to configure Spamassassin to only do certain tests
> > when the "required_score" is not yet reached.
> > For example, do the usual rule-based and bayesian tests first,
> > and if the score is lower than the "required_score",
> > then do the DCC and RAZOR2 tests.

On 22.09.09 12:08, ArtemGr wrote:
> Another way comes to mind: is to run the SpamAssassin the first time
> with the basic set of detections, and if the score is low,
> run it a second time with only the advanced detections turned on
> (using an alternative configuration by means of the "‐‐configpath=").
> 
> That rises the question, whether the basic detections can be turned off.
> 
> I found the following options:
> skip_rbl_checks 1
> dns_available no
> use_bayes 0
> use_bayes_rules 0
> bayes_auto_learn 0
> - but I do not see an option
> to turn off the static rules shipped with spamassasin.
> 
> Is there such an option?

No, unless you will run spamassassin twice, or spamc once and spamassassin
the other time. Note that running spamassassin would take much more time and
the most CPU time-consuming rules are those that are not disabled in local
mode. 

Why do you want to do that?
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Boost your system's speed by 500% - DEL C:\WINDOWS\*.*

Re: partial (lazy) scoring? - run a second time?

Posted by ArtemGr <ar...@gmail.com>.
ArtemGr <artemciy <at> gmail.com> writes:
> I would like to configure Spamassassin to only do certain tests
> when the "required_score" is not yet reached.
> For example, do the usual rule-based and bayesian tests first,
> and if the score is lower than the "required_score",
> then do the DCC and RAZOR2 tests.
> 
> Is it possible?

Another way comes to mind: is to run the SpamAssassin the first time
with the basic set of detections, and if the score is low,
run it a second time with only the advanced detections turned on
(using an alternative configuration by means of the "‐‐configpath=").

That rises the question, whether the basic detections can be turned off.

I found the following options:
skip_rbl_checks 1
dns_available no
use_bayes 0
use_bayes_rules 0
bayes_auto_learn 0
- but I do not see an option
to turn off the static rules shipped with spamassasin.

Is there such an option?