You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steven Stern <sd...@sterndata.com> on 2007/04/10 14:17:16 UTC

Help with rule

I'm trying to flag a type of spam that seems to be slipping through with
a very low score

The common factor is that all of the messages have something linke

    Just type www [.] pillking [.] org
    Just type <FONT color=#ff0000>www</FONT> [.]
<STRONG><FONT color=#ff0000>pillking</FONT></STRONG> [.] <FONT
color=#ff0000>org</FONT></FONT>

   Just type www [dot] pilldoc [dot] org

I suspect a rule that looks for "www*pill*org" would work. How do I turn
that into a regex?

Re: Help with rule

Posted by Kelson <ke...@speed.net>.
Steven Stern wrote:
> I suspect a rule that looks for "www*pill*org" would work. How do I turn
> that into a regex?

Basic:                  /www.*pill.*org/
Slightly optimized:     /www.{1,30}pill.{1,30}org/

.    matches any character.
*    means anywhere 0 or more of the preceding item, so
.*   matches 0 or more of any character.
{X,Y} means anywhere from X to Y of the preceding item.

You don't want to use .* in a SA rule, though, because if it matches 
"www" it'll keep looking for a long time until it finds "pill" or runs 
out of text to look at.  .{1,30} will match 1 to 30 of any character in 
a row, so if it finds "www" it will only look through 30 characters for 
"pill"

You can also make it more specific, matching things only at word 
boundaries, etc.

There's a good tutorial and reference at www.regular-expressions.info -- 
one of the few legit .info names I've seen.

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>