You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Loren Wilton <lw...@earthlink.net> on 2004/02/12 12:37:39 UTC

Suggestion on rules abilities

This is a suggestion to those developing SA.  Or maybe it already exists and
I just don't know it.

There ought to be some sort of "if / else" type construct that can be used
for creating rules.  For instance, I just got a spam that triggered the
following:

 3.0 PT_WORDLIST_13         BODY: string of 13+ random words
  10 PT_WORDLIST_30         BODY: string of 30+ random words
 1.0 PT_WORDLIST_10         BODY: string of 10+ random words
 0.1 HTML_MESSAGE           BODY: HTML included in message
 1.0 HTML_FONT_BIG          BODY: HTML has a big font
 5.4 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 0.9988]
  10 TINY_FONT_0            BODY: Body contains 0pt font
 0.0 LOCAL_DRUGS_DIET       LOCAL_DRUGS_DIET
 0.0 LOCAL_DRUGS_ANXIETY    LOCAL_DRUGS_ANXIETY
 0.0 LOCAL_DRUGS_PAIN       LOCAL_DRUGS_PAIN
 1.0 LOCAL_DRUGS_MALEDYSFUNCTION LOCAL_DRUGS_MALEDYSFUNCTION
 0.0 LOCAL_DRUGS_MUSCLE     LOCAL_DRUGS_MUSCLE
 1.0 LOCAL_DRUGS_PAIN_MALEDYS LOCAL_DRUGS_PAIN_MALEDYS
 0.5 LOCAL_DRUGS_DIET_PAIN  LOCAL_DRUGS_DIET_PAIN
 1.0 LOCAL_DRUGS_ANXIETY_MALEDYS LOCAL_DRUGS_ANXIETY_MALEDYS
 1.0 LOCAL_DRUGS_DIET_MALEDYS LOCAL_DRUGS_DIET_MALEDYS
 1.0 LOCAL_DRUGS_MANYKINDS  LOCAL_DRUGS_MANYKINDS

Now, if there are 30 random words, there are probably also 13, and also 10.
If there are many kinds of drugs, there are probably a lot of individual
kinds of drugs.  So once you have a major hit, why bother trying for more
minor versions of the same thing that are obviously going to hit as well?
Yes, they add up at a cumulative score.  But if I could say "if I have 30
random words, don't bother checking for 10 or 13, and just score the thing 5
points" I get exactly the same effect.  Without having to take the time to
run the minor rules that I *know* are going to hit anyway.

No matter how well written, a regexp must take *some* time to run.  If some
of them can be avoided now and then it should help reduce the overall system
load.  Or at least try to hold it somewhat constant as the spam load
continues to increase geometrically.

        Loren


Re: Suggestion on rules abilities

Posted by Matt Kettler <mk...@comcast.net>.
At 03:37 AM 2/12/04 -0800, Loren Wilton wrote:
>This is a suggestion to those developing SA.  Or maybe it already exists and
>I just don't know it.
>
>There ought to be some sort of "if / else" type construct that can be used
>for creating rules.  For instance, I just got a spam that triggered the
>following:

Meta rules do these kind of things.

>Now, if there are 30 random words, there are probably also 13, and also 10.
>If there are many kinds of drugs, there are probably a lot of individual
>kinds of drugs.  So once you have a major hit, why bother trying for more
>minor versions of the same thing that are obviously going to hit as well?

Actualy, the manykinds drug rule doesn't have any regexs in it.

It's a meta rule, and is based on adding up all the single-drug rules :)

Take a look at the bottom of antidrug.cf. Antidrug uses meta rules 
extensively to avoid repeatedly searching for the same strings.