You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matthew Newton <mc...@leicester.ac.uk> on 2004/12/07 16:32:22 UTC

New rules

Hello,

I've recently installed SA 3.0.1, and found some junk was
getting through with scores too low for my liking, especially before the
URLs made it into SURBL. I've put together a few rules to match some
of these that you might find interesting.

They are:

Rolex and "Want Watch?" messages (there must be loads of rules out there
to do this, I guess, but the default installation doesn't seem to
include any?)

header    UOLCC_ROLEX_SUB1   Subject =~ /\brolex\b/i
describe  UOLCC_ROLEX_SUB1   Subject contains the word 'rolex'
score     UOLCC_ROLEX_SUB1   0.5

header    UOLCC_ROLEX_SUB2   Subject =~ /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i
describe  UOLCC_ROLEX_SUB2   Subject contains a gappy version of 'rolex'
score     UOLCC_ROLEX_SUB2   1.5

body      UOLCC_ROLEX_BODY1  /\brolex\b/i
describe  UOLCC_ROLEX_BODY1  Body contains the word 'rolex'
score     UOLCC_ROLEX_BODY1  0.5

body      UOLCC_ROLEX_BODY2  /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i
describe  UOLCC_ROLEX_BODY2  Body contains a gappy version of 'rolex'
score     UOLCC_ROLEX_BODY2  1.5

rawbody   UOLCC_WATCH_BODY   /^(Do you )?[Ww]ant (a )?(cheap )?([Ww]ristw|W)atch\?\s*$/m
describe  UOLCC_WATCH_BODY   Body asks if you want a watch
score     UOLCC_WATCH_BODY   2

Checking messages with two lines of just b, B, space and 1 in them.
Seems to be some sort of code used in spam, maybe:

full      UOLCC_BBONE        /\n[bB1 ]{8,20}\n[bB1 ]{8,20}\n/s
describe  UOLCC_BBONE        Contains two code lines with b, B and 1
score     UOLCC_BBONE        2

Checking one particular type of spam that has a URL (that follows a
certain pattern, ends .htm), blank line, line of proverb or something,
blank, line of name, blank, exact same URL with "l" on the end (i.e.
ends .html). I guess the rules should be small, but this one has picked
up loads of spam for me:

full      UOLCC_HTM_HTML_URL /\n(http:\/\/[a-z]+\.[a-z]{3,4}\/[0-9a-f]{5,35}\/[[:alnum:]]{5,20}=?\.htm)\s\n\s*\n[[:alnum:]\?\.',\s:,-]+\n\s*\n[^\s,.]+(\s[^\s,.]+){0,15}\n\s*\n\1l/s
describe  UOLCC_HTM_HTML_URL Matches pattern of spam mail (.htm .html)
score     UOLCC_HTM_HTML_URL 3.5

Finally, a string of words (more than 15 here) that all begin with a
capital letter, and no punctuation (I'm only testing this one at the
moment, hence the low score):

body      UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
describe  UOLCC_CAPWORD_TEST String of words that all begin with caps letter
score     UOLCC_CAPWORD_TEST 0.1


Hope these are of use to someone. If anyone can show me that they are
likely to pick up false positives, I'd be most grateful.

Thanks,

-- 
Matthew Newton <mc...@le.ac.uk>

UNIX Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom

Re: New rules

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Matthew,

Tuesday, December 7, 2004, 7:32:22 AM, you wrote:

MN> Hello,

MN> I've recently installed SA 3.0.1, and found some junk was
MN> getting through with scores too low for my liking, especially before the
MN> URLs made it into SURBL. I've put together a few rules to match some
MN> of these that you might find interesting.

My mass-check results on your rules:

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL    SPAM      HAM      S/O    RANK  SCORE  NAME
  95120    59679    35441    0.627   0.00   0.00  (all messages)
   2231     2231        0    1.000   1.00   0.50  UOLCC_ROLEX_SUB1
   3331     3330        1    0.999   0.90   0.50  UOLCC_ROLEX_BODY1
    470      470        0    1.000   0.86   2.00  UOLCC_WATCH_BODY
     19       19        0    1.000   0.45   1.50  UOLCC_ROLEX_BODY2
      0        0        0    0.500   0.31   3.50  UOLCC_HTM_HTML_URL
      0        0        0    0.500   0.31   1.50  UOLCC_ROLEX_SUB2
     66       37       29    0.431   0.10   0.10  UOLCC_CAPWORD_TEST
    136       51       85    0.263   0.00   2.00  UOLCC_BBONE

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  95120    59679    35441    0.627   0.00    0.00  (all messages)
100.000  62.7407  37.2593    0.627   0.00    0.00  (all messages as %)
  2.345   3.7383   0.0000    1.000   1.00    0.50  UOLCC_ROLEX_SUB1
  3.502   5.5799   0.0028    0.999   0.90    0.50  UOLCC_ROLEX_BODY1
  0.494   0.7875   0.0000    1.000   0.86    2.00  UOLCC_WATCH_BODY
  0.020   0.0318   0.0000    1.000   0.45    1.50  UOLCC_ROLEX_BODY2
  0.000   0.0000   0.0000    0.500   0.31    3.50  UOLCC_HTM_HTML_URL
  0.000   0.0000   0.0000    0.500   0.31    1.50  UOLCC_ROLEX_SUB2
  0.069   0.0620   0.0818    0.431   0.10    0.10  UOLCC_CAPWORD_TEST
  0.143   0.0855   0.2398    0.263   0.00    2.00  UOLCC_BBONE

The single words Rolex in subject and/or body are the best hitters,
but then nobody in my domains discusses buying Rolex watches as
birthday or anniversary presents.

Bob Menschel




Re: New rules

Posted by Alex Broens <sa...@alexb.ch>.
Matthew Newton wrote:
> On Wed, Dec 08, 2004 at 02:22:07PM +0100, Alex Broens wrote:
> 
>>Matthew Newton wrote:
>>
>>>I've recently installed SA 3.0.1, and found some junk was
>>>getting through with scores too low for my liking, especially before the
>>>URLs made it into SURBL. I've put together a few rules to match some
>>>of these that you might find interesting.
>>>
>>>They are:
>>>
>>>Finally, a string of words (more than 15 here) that all begin with a
>>>capital letter, and no punctuation (I'm only testing this one at the
>>>moment, hence the low score):
>>>
>>>body      UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
>>>describe  UOLCC_CAPWORD_TEST String of words that all begin with caps 
>>>letter
>>>score     UOLCC_CAPWORD_TEST 0.1
>>>
>>>
>>>Hope these are of use to someone. If anyone can show me that they are
>>>likely to pick up false positives, I'd be most grateful.
>>
>>This will likely trigger on several airline ticket confirmation messages 
>>which, for some unknown highly scientific reason, are always sent all caps.
> 
> 
> Do they send out e-mails with Each Word Starting With A Capital Letter
> with no punctuation between 15 words and all words longer than 3
> letters? 
> 
> I would expect perhaps everything in capitals, but not the above?

Yep... all in capitals, not starting only.

goofed it.

Alex




Re: New rules

Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Wed, Dec 08, 2004 at 02:22:07PM +0100, Alex Broens wrote:
> Matthew Newton wrote:
> >
> >I've recently installed SA 3.0.1, and found some junk was
> >getting through with scores too low for my liking, especially before the
> >URLs made it into SURBL. I've put together a few rules to match some
> >of these that you might find interesting.
> >
> >They are:
> >
> >Finally, a string of words (more than 15 here) that all begin with a
> >capital letter, and no punctuation (I'm only testing this one at the
> >moment, hence the low score):
> >
> >body      UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> >describe  UOLCC_CAPWORD_TEST String of words that all begin with caps 
> >letter
> >score     UOLCC_CAPWORD_TEST 0.1
> >
> >
> >Hope these are of use to someone. If anyone can show me that they are
> >likely to pick up false positives, I'd be most grateful.
> 
> This will likely trigger on several airline ticket confirmation messages 
> which, for some unknown highly scientific reason, are always sent all caps.

Do they send out e-mails with Each Word Starting With A Capital Letter
with no punctuation between 15 words and all words longer than 3
letters? 

I would expect perhaps everything in capitals, but not the above?

Thanks

Matthew


-- 
Matthew Newton <mc...@le.ac.uk>

UNIX Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom

Re: New rules

Posted by Loren Wilton <lw...@earthlink.net>.
> Getting off topic here, but the all caps is probably a holdover from
> the old SABRE airline reservation system which used a 6-bit codeset
> to reduce the transmission time on their (at the time) slow data links.

Actually it was because the SABRE machines also used a 5-bit code set (and
still largely do).  (Assuming of course SABRE was the Sperry rather than IBM
reservation system; I forget which was which.)

        Loren


Re: New rules

Posted by Bill Randle <bi...@neocat.org>.
On Wed, 2004-12-08 at 05:22, Alex Broens wrote:
> Matthew Newton wrote:
> > Hello,
> > 
> > I've recently installed SA 3.0.1, and found some junk was
> > getting through with scores too low for my liking, especially before the
> > URLs made it into SURBL. I've put together a few rules to match some
> > of these that you might find interesting.
> > 
> > They are:
> > 
> > Finally, a string of words (more than 15 here) that all begin with a
> > capital letter, and no punctuation (I'm only testing this one at the
> > moment, hence the low score):
> > 
> > body      UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> > describe  UOLCC_CAPWORD_TEST String of words that all begin with caps letter
> > score     UOLCC_CAPWORD_TEST 0.1
> > 
> > 
> > Hope these are of use to someone. If anyone can show me that they are
> > likely to pick up false positives, I'd be most grateful.
> 
> This will likely trigger on several airline ticket confirmation messages 
> which, for some unknown highly scientific reason, are always sent all caps.

Getting off topic here, but the all caps is probably a holdover from 
the old SABRE airline reservation system which used a 6-bit codeset
to reduce the transmission time on their (at the time) slow data links.

	-Bill



Re: New rules

Posted by Alex Broens <sa...@alexb.ch>.
Matthew Newton wrote:
> Hello,
> 
> I've recently installed SA 3.0.1, and found some junk was
> getting through with scores too low for my liking, especially before the
> URLs made it into SURBL. I've put together a few rules to match some
> of these that you might find interesting.
> 
> They are:
> 
> Finally, a string of words (more than 15 here) that all begin with a
> capital letter, and no punctuation (I'm only testing this one at the
> moment, hence the low score):
> 
> body      UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> describe  UOLCC_CAPWORD_TEST String of words that all begin with caps letter
> score     UOLCC_CAPWORD_TEST 0.1
> 
> 
> Hope these are of use to someone. If anyone can show me that they are
> likely to pick up false positives, I'd be most grateful.

This will likely trigger on several airline ticket confirmation messages 
which, for some unknown highly scientific reason, are always sent all caps.


Alex