You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matthew Newton <mc...@leicester.ac.uk> on 2004/12/07 16:32:22 UTC
New rules
Hello,
I've recently installed SA 3.0.1, and found some junk was
getting through with scores too low for my liking, especially before the
URLs made it into SURBL. I've put together a few rules to match some
of these that you might find interesting.
They are:
Rolex and "Want Watch?" messages (there must be loads of rules out there
to do this, I guess, but the default installation doesn't seem to
include any?)
header UOLCC_ROLEX_SUB1 Subject =~ /\brolex\b/i
describe UOLCC_ROLEX_SUB1 Subject contains the word 'rolex'
score UOLCC_ROLEX_SUB1 0.5
header UOLCC_ROLEX_SUB2 Subject =~ /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i
describe UOLCC_ROLEX_SUB2 Subject contains a gappy version of 'rolex'
score UOLCC_ROLEX_SUB2 1.5
body UOLCC_ROLEX_BODY1 /\brolex\b/i
describe UOLCC_ROLEX_BODY1 Body contains the word 'rolex'
score UOLCC_ROLEX_BODY1 0.5
body UOLCC_ROLEX_BODY2 /\br.{1,2}o.{1,2}l.{1,2}e.{1,2}x\b/i
describe UOLCC_ROLEX_BODY2 Body contains a gappy version of 'rolex'
score UOLCC_ROLEX_BODY2 1.5
rawbody UOLCC_WATCH_BODY /^(Do you )?[Ww]ant (a )?(cheap )?([Ww]ristw|W)atch\?\s*$/m
describe UOLCC_WATCH_BODY Body asks if you want a watch
score UOLCC_WATCH_BODY 2
Checking messages with two lines of just b, B, space and 1 in them.
Seems to be some sort of code used in spam, maybe:
full UOLCC_BBONE /\n[bB1 ]{8,20}\n[bB1 ]{8,20}\n/s
describe UOLCC_BBONE Contains two code lines with b, B and 1
score UOLCC_BBONE 2
Checking one particular type of spam that has a URL (that follows a
certain pattern, ends .htm), blank line, line of proverb or something,
blank, line of name, blank, exact same URL with "l" on the end (i.e.
ends .html). I guess the rules should be small, but this one has picked
up loads of spam for me:
full UOLCC_HTM_HTML_URL /\n(http:\/\/[a-z]+\.[a-z]{3,4}\/[0-9a-f]{5,35}\/[[:alnum:]]{5,20}=?\.htm)\s\n\s*\n[[:alnum:]\?\.',\s:,-]+\n\s*\n[^\s,.]+(\s[^\s,.]+){0,15}\n\s*\n\1l/s
describe UOLCC_HTM_HTML_URL Matches pattern of spam mail (.htm .html)
score UOLCC_HTM_HTML_URL 3.5
Finally, a string of words (more than 15 here) that all begin with a
capital letter, and no punctuation (I'm only testing this one at the
moment, hence the low score):
body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
describe UOLCC_CAPWORD_TEST String of words that all begin with caps letter
score UOLCC_CAPWORD_TEST 0.1
Hope these are of use to someone. If anyone can show me that they are
likely to pick up false positives, I'd be most grateful.
Thanks,
--
Matthew Newton <mc...@le.ac.uk>
UNIX Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom
Re: New rules
Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Matthew,
Tuesday, December 7, 2004, 7:32:22 AM, you wrote:
MN> Hello,
MN> I've recently installed SA 3.0.1, and found some junk was
MN> getting through with scores too low for my liking, especially before the
MN> URLs made it into SURBL. I've put together a few rules to match some
MN> of these that you might find interesting.
My mass-check results on your rules:
Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)
OVERALL SPAM HAM S/O RANK SCORE NAME
95120 59679 35441 0.627 0.00 0.00 (all messages)
2231 2231 0 1.000 1.00 0.50 UOLCC_ROLEX_SUB1
3331 3330 1 0.999 0.90 0.50 UOLCC_ROLEX_BODY1
470 470 0 1.000 0.86 2.00 UOLCC_WATCH_BODY
19 19 0 1.000 0.45 1.50 UOLCC_ROLEX_BODY2
0 0 0 0.500 0.31 3.50 UOLCC_HTM_HTML_URL
0 0 0 0.500 0.31 1.50 UOLCC_ROLEX_SUB2
66 37 29 0.431 0.10 0.10 UOLCC_CAPWORD_TEST
136 51 85 0.263 0.00 2.00 UOLCC_BBONE
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
95120 59679 35441 0.627 0.00 0.00 (all messages)
100.000 62.7407 37.2593 0.627 0.00 0.00 (all messages as %)
2.345 3.7383 0.0000 1.000 1.00 0.50 UOLCC_ROLEX_SUB1
3.502 5.5799 0.0028 0.999 0.90 0.50 UOLCC_ROLEX_BODY1
0.494 0.7875 0.0000 1.000 0.86 2.00 UOLCC_WATCH_BODY
0.020 0.0318 0.0000 1.000 0.45 1.50 UOLCC_ROLEX_BODY2
0.000 0.0000 0.0000 0.500 0.31 3.50 UOLCC_HTM_HTML_URL
0.000 0.0000 0.0000 0.500 0.31 1.50 UOLCC_ROLEX_SUB2
0.069 0.0620 0.0818 0.431 0.10 0.10 UOLCC_CAPWORD_TEST
0.143 0.0855 0.2398 0.263 0.00 2.00 UOLCC_BBONE
The single words Rolex in subject and/or body are the best hitters,
but then nobody in my domains discusses buying Rolex watches as
birthday or anniversary presents.
Bob Menschel
Re: New rules
Posted by Alex Broens <sa...@alexb.ch>.
Matthew Newton wrote:
> On Wed, Dec 08, 2004 at 02:22:07PM +0100, Alex Broens wrote:
>
>>Matthew Newton wrote:
>>
>>>I've recently installed SA 3.0.1, and found some junk was
>>>getting through with scores too low for my liking, especially before the
>>>URLs made it into SURBL. I've put together a few rules to match some
>>>of these that you might find interesting.
>>>
>>>They are:
>>>
>>>Finally, a string of words (more than 15 here) that all begin with a
>>>capital letter, and no punctuation (I'm only testing this one at the
>>>moment, hence the low score):
>>>
>>>body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
>>>describe UOLCC_CAPWORD_TEST String of words that all begin with caps
>>>letter
>>>score UOLCC_CAPWORD_TEST 0.1
>>>
>>>
>>>Hope these are of use to someone. If anyone can show me that they are
>>>likely to pick up false positives, I'd be most grateful.
>>
>>This will likely trigger on several airline ticket confirmation messages
>>which, for some unknown highly scientific reason, are always sent all caps.
>
>
> Do they send out e-mails with Each Word Starting With A Capital Letter
> with no punctuation between 15 words and all words longer than 3
> letters?
>
> I would expect perhaps everything in capitals, but not the above?
Yep... all in capitals, not starting only.
goofed it.
Alex
Re: New rules
Posted by Matthew Newton <mc...@leicester.ac.uk>.
On Wed, Dec 08, 2004 at 02:22:07PM +0100, Alex Broens wrote:
> Matthew Newton wrote:
> >
> >I've recently installed SA 3.0.1, and found some junk was
> >getting through with scores too low for my liking, especially before the
> >URLs made it into SURBL. I've put together a few rules to match some
> >of these that you might find interesting.
> >
> >They are:
> >
> >Finally, a string of words (more than 15 here) that all begin with a
> >capital letter, and no punctuation (I'm only testing this one at the
> >moment, hence the low score):
> >
> >body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> >describe UOLCC_CAPWORD_TEST String of words that all begin with caps
> >letter
> >score UOLCC_CAPWORD_TEST 0.1
> >
> >
> >Hope these are of use to someone. If anyone can show me that they are
> >likely to pick up false positives, I'd be most grateful.
>
> This will likely trigger on several airline ticket confirmation messages
> which, for some unknown highly scientific reason, are always sent all caps.
Do they send out e-mails with Each Word Starting With A Capital Letter
with no punctuation between 15 words and all words longer than 3
letters?
I would expect perhaps everything in capitals, but not the above?
Thanks
Matthew
--
Matthew Newton <mc...@le.ac.uk>
UNIX Systems Administrator, Network Support Section,
Computer Centre, University of Leicester,
Leicester LE1 7RH, United Kingdom
Re: New rules
Posted by Loren Wilton <lw...@earthlink.net>.
> Getting off topic here, but the all caps is probably a holdover from
> the old SABRE airline reservation system which used a 6-bit codeset
> to reduce the transmission time on their (at the time) slow data links.
Actually it was because the SABRE machines also used a 5-bit code set (and
still largely do). (Assuming of course SABRE was the Sperry rather than IBM
reservation system; I forget which was which.)
Loren
Re: New rules
Posted by Bill Randle <bi...@neocat.org>.
On Wed, 2004-12-08 at 05:22, Alex Broens wrote:
> Matthew Newton wrote:
> > Hello,
> >
> > I've recently installed SA 3.0.1, and found some junk was
> > getting through with scores too low for my liking, especially before the
> > URLs made it into SURBL. I've put together a few rules to match some
> > of these that you might find interesting.
> >
> > They are:
> >
> > Finally, a string of words (more than 15 here) that all begin with a
> > capital letter, and no punctuation (I'm only testing this one at the
> > moment, hence the low score):
> >
> > body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> > describe UOLCC_CAPWORD_TEST String of words that all begin with caps letter
> > score UOLCC_CAPWORD_TEST 0.1
> >
> >
> > Hope these are of use to someone. If anyone can show me that they are
> > likely to pick up false positives, I'd be most grateful.
>
> This will likely trigger on several airline ticket confirmation messages
> which, for some unknown highly scientific reason, are always sent all caps.
Getting off topic here, but the all caps is probably a holdover from
the old SABRE airline reservation system which used a 6-bit codeset
to reduce the transmission time on their (at the time) slow data links.
-Bill
Re: New rules
Posted by Alex Broens <sa...@alexb.ch>.
Matthew Newton wrote:
> Hello,
>
> I've recently installed SA 3.0.1, and found some junk was
> getting through with scores too low for my liking, especially before the
> URLs made it into SURBL. I've put together a few rules to match some
> of these that you might find interesting.
>
> They are:
>
> Finally, a string of words (more than 15 here) that all begin with a
> capital letter, and no punctuation (I'm only testing this one at the
> moment, hence the low score):
>
> body UOLCC_CAPWORD_TEST /([A-Z][a-z]{3,}\s{1,2}){15,}/s
> describe UOLCC_CAPWORD_TEST String of words that all begin with caps letter
> score UOLCC_CAPWORD_TEST 0.1
>
>
> Hope these are of use to someone. If anyone can show me that they are
> likely to pick up false positives, I'd be most grateful.
This will likely trigger on several airline ticket confirmation messages
which, for some unknown highly scientific reason, are always sent all caps.
Alex