You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ned Slider <ne...@unixmail.co.uk> on 2011/11/26 20:46:26 UTC

Trends in spam

Hi list,

One of the main strengths of SpamAssassin is the ability to allow the 
user to write their own custom rules. However, writing good rules is not 
always easy and one of the hardest parts is identifying trends in spam 
worth targeting.

So what trends have you noticed this week?

Here's one I spotted whilst reviewing the weeks spam. I've seen a bunch 
of URIs with the following pattern:

http://some.example.domain.com/profile/12BobSmith/

There's even an example sent the LKML here:

https://lkml.org/lkml/2011/11/26/55

and have written a rule to match the /profile/12FirstnameLastname/ part. 
So in the spirit of sharing, enjoy my rule:

# URIs matching http://some.domain.com/profile/12FirstLastname/
uri		LOCAL_URI_PROFILE 
m{https?://.{1,40}/profile/\d\d[A-Z][a-z]{1,20}[A-Z][a-z]{1,20}/}
describe	LOCAL_URI_PROFILE	Spammy URI with profile
score		LOCAL_URI_PROFILE	1

Score as you see fit.

So far these URIs have been redirectors to "jobs" sites. Clearly there 
is some malware that is compromising web hosts using this same template 
so this type of rule nullifies them all in one go - no need to collect 
the domains and feed them into the black lists (although this is still a 
good idea).

So have you spotted any decent trends worth sharing this week?


Re: Trends in spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sat, 2011-11-26 at 22:11 +0000, Ned Slider wrote:
> On 26/11/11 21:36, Karsten Bräckelmann wrote:
> > On Sat, 2011-11-26 at 19:46 +0000, Ned Slider wrote:
> > > # URIs matching http://some.domain.com/profile/12FirstLastname/
> > > uri		LOCAL_URI_PROFILE
> > > m{https?://.{1,40}/profile/\d\d[A-Z][a-z]{1,20}[A-Z][a-z]{1,20}/}
> >              ^^^^^^^
> >
> > Using [^/]+ for the domain part instead should be slightly more
> > efficient, since it will never result in backtracking. Also stricter,
> > since it anchors the /profile directory at the domain.
> 
> Now you made me go away and read the perl RE manual!

That was not my intention, but a good outcome nonetheless. :)

> Nice - [^/] meaning to match any character that is not a "/", one or 
> more (+) times. I'd not come across negated character classes before so 
> thanks for the tip!

Negated char classes, or more precisely mutually exclusive char classes
often are useful, if they are possible at all. In an example like the
above, one can think of it as full-speed slurp mode. No possible
alternative matches, no backtracking, no ambiguity.

The point is, that any range is greedy by default. So regardless whether
one uses the general + or *, or limited {x,y}, the regex engine first
will attempt to consume as much chars as possible -- gradually trying
with less, if the following sub-RE doesn't match.

So .{1,40}/ will first consume 40 chars, check if the next char is a
slash -- and give up a char, if it is no slash, to try again...

The amount of backtracking possibly involved becomes more obvious with
an example URI like this:
  example.com/profile/foo/bar/baz/

With mutually exclusive char sets, the greedy [^/]+ will consume all
non-slash chars, and without any backtracking or futile matching
attempts swallow exactly the domain part in one go.


> And anchoring the /profile directory at the domain definitely reduces 
> the likelihood of FPs.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Trends in spam

Posted by Ned Slider <ne...@unixmail.co.uk>.
On 26/11/11 21:36, Karsten Bräckelmann wrote:
> On Sat, 2011-11-26 at 19:46 +0000, Ned Slider wrote:
>> # URIs matching http://some.domain.com/profile/12FirstLastname/
>> uri		LOCAL_URI_PROFILE
>> m{https?://.{1,40}/profile/\d\d[A-Z][a-z]{1,20}[A-Z][a-z]{1,20}/}
>               ^^^^^^^
>
> Using [^/]+ for the domain part instead should be slightly more
> efficient, since it will never result in backtracking. Also stricter,
> since it anchors the /profile directory at the domain.
>

Now you made me go away and read the perl RE manual!

Nice - [^/] meaning to match any character that is not a "/", one or 
more (+) times. I'd not come across negated character classes before so 
thanks for the tip!

And anchoring the /profile directory at the domain definitely reduces 
the likelihood of FPs.


Re: Trends in spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sat, 2011-11-26 at 19:46 +0000, Ned Slider wrote:
> # URIs matching http://some.domain.com/profile/12FirstLastname/
> uri		LOCAL_URI_PROFILE 
> m{https?://.{1,40}/profile/\d\d[A-Z][a-z]{1,20}[A-Z][a-z]{1,20}/}
             ^^^^^^^

Using [^/]+ for the domain part instead should be slightly more
efficient, since it will never result in backtracking. Also stricter,
since it anchors the /profile directory at the domain.

Not really that important in this case, but generally worth to keep in
mind.

The upper limit of 40 chars in the original RE is good practice. :)
Though not necessary with mutually exclusive char sets as in my snippet.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}