You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steve Bondy <sb...@rafte.com> on 2004/11/03 21:09:45 UTC

Question about a custom rule

I'm working up a rule (for 2.64) to try to increase the score of HTML
messages that contain a 1x1 pixel image which references an ASP script.
I've come up with this:

rawbody LR_IMAGE_TAGGED_ASP =~ /\<img width\=1 height\=1
src\=.*\.asp.*/i
description LR_IMAGE_TAGGED_ASP Images with sources pointing to asp
files
score LR_IMAGE_TAGGED_ASP 1.0

Could anyone offer me comments on this?  I haven't written to many of my
own rules, so I'm a bit weak on the regex.  Or is there already
something like this in the default rule set I haven't seen?  Or am I
just being paranoid about web-bugs?

Steve

Re: Question about a custom rule

Posted by Loren Wilton <lw...@earthlink.net>.
> rawbody LR_IMAGE_TAGGED_ASP =~ /\<img width\=1 height\=1
> src\=.*\.asp.*/i

There are a couple of things to consider here.  The first is that rawbody
only gives the rule a single physical line of the message, so if the target
you are looking for spans lines the rule will never hit.  If your target is
most always on a single line you will be ok with rawbody.  Sometimes you can
get around this using 'full' instead of rawbody; but full messages won't be
base64 and suchlike decoded.

You don't need the backslash before the = sign, but it won't hurt.

The area after src= is potentially of concern, both for efficiency and
possible false positives.

Looking for .* is almost always a bad idea, since this can take forever in
certain cases based on the incoming message format.  You would be better off
limiting the size of the search: src=.{15,36}, for instance.  Even better
would be to limit what you are searching for.  This is probably a cid or
url, so will have a limited character set.  Perhaps something like
src=(?:cid:|http:\/\/)[\w\.\-]{10,40} to get past the first part.

The check for the asp suffix itself is a little dangerous as you have it
coded.  It will hit on ".asp" followed by anything: .asppy, for instance.
Since what you want is at the end of the url or file name, you really don't
want another word-character showing up after the asp.  Also, you really
don't care what else might show up after that (other than not being a word
character) so .* at the end of the re buys you nothing except another time
sink.  A better choice might be \.asp\b or \.asp\W.  These will insure that
you have asp with a non-word character after it.  Of course, you would also
like to be sure there isn't a dot after it; ie: it really is the end of the
name.  So \.asp[^\.\w] might be a good choice.

Putting it all together, you might end up with something like

/\<img width\=1\sheight\=1>\ssrc\=[\'\"]?[\w\.]{5,40}\.asp[^\w\.]/i

The \s will allow any number of spaces (or tabs) between the elements, and
there is also a conditional check to allow some sort of quoting around the
file name.

        Loren