You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Daniel Watts <d...@nielwatts.com> on 2005/10/30 20:20:34 UTC
Rule for the SOFTWARE spam
Here's an example:
=====================================================
Subject: Software
====================================================
[snipped because otherwise this post is blocked! See below for spam details]
====================================================
Here's what I've found consistant between spams:
1. Subject: Software
2. "New software on our site:"
3 At least 10 separate $ signs
4 At least 10 separate - signs
5 "Our site:"
6 http://at.least.four.parts
-------------------------------------------------------
Here's my attempt to write a rule to cover these. However I am not an
expert with regexps and even after research am sure the rules for the
multiple matches on - and $ are incorrect. Could someone help me sort
those out?
-------------------RULES-------------------------------
header SOFTWARE_SPAM_SUBJECT Subject =~ /^Software *$/
# matches Software in the subject
body SOFTWARE_SPAM_BODY1 /^New software on our site: *$/
#matches "New software on our site:" on it's own line
body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
#matches $xx.xx at least 10 times
body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s
#matches at least 10 hyphens with spaces round them
body SOFTWARE_SPAM_BODY4 /^Our site: *$/
#matches "Our site:" on it's own line
body SOFTWARE_SPAM_BODY5
/http:\/\/(([a-zA-Z0-9]+[a-zA-Z0-9_-]*)\.){3,}([a-zA-Z0-9]+[a-zA-Z0-9_-]*)/
#matches url with 4 parts
meta SOFTWARE_SPAM (SOFTWARE_SPAM_SUBJECT && SOFTWARE_SPAM_BODY1 &&
SOFTWARE_SPAM_BODY2 && SOFTWARE_SPAM_BODY3 && SOFTWARE_SPAM_BODY4 &&
SOFTWARE_SPAM_BODY5)
score SOFTWARE_SPAM 10
--------------------------------------------------
Many thanks,
Daniel
Re: Rule for the SOFTWARE spam
Posted by Michael Monnerie <m....@zmi.at>.
On Sonntag, 30. Oktober 2005 20:20 Daniel Watts wrote:
I'm not The God Of Regex, but maybe that helps:
> body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
> #matches $xx.xx at least 10 times
If you have "$133", your rule doesn't fit, as you forgot the ".". Try
body SOFTWARE_SPAM_BODY2 /\$\d{1,3}(?:\.|)\d{0,2}/
But I'm not sure what your rule should do - it would only find
"$1$12$12$133$133$1$1$1$1$1" and such - did you want to find a
$<number>, or just ten $$$$ signs ?
body TENDOLLARSIGNS /\${10,})/
> body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s
> #matches at least 10 hyphens with spaces round them
This finds " - - - - - - -" (just 10x -), always with a space
before and after, so 2 spaces between each "-". Probably you wanted
this:
body SOFTWARE_SPAM_BODY3 / \-{10,} /
you don't need the brackets. Have a look at
http://perldoc.perl.org/index-tutorials.html and there "perlrequick"
and "perlretut".
mfg zmi
--
// Michael Monnerie, Ing.BSc --- it-management Michael Monnerie
// http://zmi.at Tel: 0660/4156531 Linux 2.6.11
// PGP Key: "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952 F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879
Re: [Sare] Rule for the SOFTWARE spam
Posted by Daniel Watts <d...@nielwatts.com>.
>
>
> Robert Menschel wrote:
>
>> Hello Doc,
>>
>> Sunday, October 30, 2005, 10:36:39 AM, you wrote:
>> DS> Daniel,
>>
>> DS> I got a whole lot of these... if you could make that into a DS>
>> 00_software_DW.cf file I can run it through the SARE masscheckers, if
>> DS> you want.
>>
>> If they're good enough, they can be added to Fred's OEM file.
>>
>> Daniel, I didn't see any "score" lines in the extract Doc sent to the
>> SARE list ... the default score on any "unscored" rule is 1.0, so you
>> need to be aware that if any non-spam matches say 3 of your rules,
>> it'll get a 3.0 score just from those rules.
>>
>> For testing you might want to set those rules to score 0.01 (leave the
>> meta as it is), and then in production change those rules to
>> __nonscore rules.
>>
>> Bob Menschel
>>
>>
>>
> Very kind of you guys.
> I must admit i'm not 100% sure what you mean by your
> 00_software_DW.cf file but i've attached my best guess to this email!
> Daniel
>
>------------------------------------------------------------------------
>
>#test cf file to run against software spam email
>#Tries to match several attributes within the email
>#Written by Daniel Watts Oct 2005 and submitted to the spamassassin list for public assesment and use.
>
># matches Software in the subject
>header SOFTWARE_SPAM_SUBJECT Subject =~ /^Software *$/
>
>#matches "New software on our site:" on it's own line
>body SOFTWARE_SPAM_BODY1 /^New software on our site: *$/
>
>#matches $xx.xx at least 10 times
>body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
>
>
>#matches at least 10 hyphens with spaces round them
>body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s
>
>
>#matches "Our site:" on it's own line
>body SOFTWARE_SPAM_BODY4 /^Our site: *$/
>
>
>#matches url with 4 parts
>body SOFTWARE_SPAM_BODY5
>/http:\/\/(([a-zA-Z0-9]+[a-zA-Z0-9_-]*)\.){3,}([a-zA-Z0-9]+[a-zA-Z0-9_-]*)/
>
>
>
>meta SOFTWARE_SPAM (SOFTWARE_SPAM_SUBJECT && SOFTWARE_SPAM_BODY1 && SOFTWARE_SPAM_BODY2 && SOFTWARE_SPAM_BODY3 && SOFTWARE_SPAM_BODY4 && SOFTWARE_SPAM_BODY5)
>
>description SOFTWARE_SPAM Unsolicited message selling software
>
>#None of these should singly mark a message as spam
>SOFTWARE_SPAM_SUBJECT 0.01
>SOFTWARE_SPAM_BODY1 0.01
>SOFTWARE_SPAM_BODY2 0.01
>SOFTWARE_SPAM_BODY3 0.01
>SOFTWARE_SPAM_BODY4 0.01
>SOFTWARE_SPAM_BODY5 0.01
>
>#The collection is almost certainly spam
>score SOFTWARE_SPAM 10
>
Sorry noticed 2 typos.
Ammended file attached again.
Re: [Sare] Rule for the SOFTWARE spam
Posted by Daniel Watts <d...@nielwatts.com>.
Hi guys,
Wow this detail of feedback, recieved so fast, is great.
Evidently I didn't get it right the first time. Please bear with me -
this is my first rule attempt.
I've spent another hour or so and I think I have it right this time.
Please if you would try the attached cf file against your mass checkers
we should find a more positive result!
FILE: 01_software_DW.cf
Sincerely,
Daniel
Re: [Sare] Rule for the SOFTWARE spam
Posted by Daniel Watts <d...@nielwatts.com>.
Robert Menschel wrote:
>Hello Doc,
>
>Sunday, October 30, 2005, 10:36:39 AM, you wrote:
>DS> Daniel,
>
>DS> I got a whole lot of these... if you could make that into a
>DS> 00_software_DW.cf file I can run it through the SARE masscheckers, if
>DS> you want.
>
>If they're good enough, they can be added to Fred's OEM file.
>
>Daniel, I didn't see any "score" lines in the extract Doc sent to the
>SARE list ... the default score on any "unscored" rule is 1.0, so you
>need to be aware that if any non-spam matches say 3 of your rules,
>it'll get a 3.0 score just from those rules.
>
>For testing you might want to set those rules to score 0.01 (leave the
>meta as it is), and then in production change those rules to
>__nonscore rules.
>
>Bob Menschel
>
>
>
Very kind of you guys.
I must admit i'm not 100% sure what you mean by your
00_software_DW.cf file but i've attached my best guess to this email!
Daniel