You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Daniel Watts <d...@nielwatts.com> on 2005/10/30 20:20:34 UTC

Rule for the SOFTWARE spam

Here's an example:
=====================================================
Subject: Software
====================================================
[snipped because otherwise this post is blocked! See below for spam details]
====================================================



Here's what I've found consistant between spams:

1. Subject: Software
2. "New software on our site:"
3  At least 10 separate $ signs
4  At least 10 separate - signs
5  "Our site:"
6  http://at.least.four.parts



-------------------------------------------------------
Here's my attempt to write a rule to cover these. However I am not an
expert with regexps and even after research am sure the rules for the
multiple matches on - and $ are incorrect. Could someone help me sort
those out?

-------------------RULES-------------------------------
header SOFTWARE_SPAM_SUBJECT Subject =~ /^Software *$/		
# matches Software in the subject

body SOFTWARE_SPAM_BODY1 /^New software on our site: *$/
#matches "New software on our site:" on it's own line

body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
#matches $xx.xx at least 10 times

body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s				
#matches at least 10 hyphens with spaces round them

body SOFTWARE_SPAM_BODY4 /^Our site: *$/
#matches "Our site:" on it's own line

body SOFTWARE_SPAM_BODY5
/http:\/\/(([a-zA-Z0-9]+[a-zA-Z0-9_-]*)\.){3,}([a-zA-Z0-9]+[a-zA-Z0-9_-]*)/
  #matches url with 4 parts


meta SOFTWARE_SPAM (SOFTWARE_SPAM_SUBJECT && SOFTWARE_SPAM_BODY1 &&
SOFTWARE_SPAM_BODY2 && SOFTWARE_SPAM_BODY3 && SOFTWARE_SPAM_BODY4 &&
SOFTWARE_SPAM_BODY5)

score SOFTWARE_SPAM 10
--------------------------------------------------


Many thanks,
Daniel



Re: Rule for the SOFTWARE spam

Posted by Michael Monnerie <m....@zmi.at>.
On Sonntag, 30. Oktober 2005 20:20 Daniel Watts wrote:

I'm not The God Of Regex, but maybe that helps:

> body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
> #matches $xx.xx at least 10 times

If you have "$133", your rule doesn't fit, as you forgot the ".". Try
body SOFTWARE_SPAM_BODY2 /\$\d{1,3}(?:\.|)\d{0,2}/

But I'm not sure what your rule should do - it would only find 
"$1$12$12$133$133$1$1$1$1$1" and such - did you want to find a 
$<number>, or just ten $$$$ signs ?

body TENDOLLARSIGNS /\${10,})/

> body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s                         
> #matches at least 10 hyphens with spaces round them

This finds " -  -  -  -  -  -  -" (just 10x -), always with a space 
before and after, so 2 spaces between each "-". Probably you wanted 
this:

body SOFTWARE_SPAM_BODY3 / \-{10,} /

you don't need the brackets. Have a look at 
http://perldoc.perl.org/index-tutorials.html and there "perlrequick" 
and "perlretut".

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at           Tel: 0660/4156531          Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net                 Key-ID: 0x70545879

Re: [Sare] Rule for the SOFTWARE spam

Posted by Daniel Watts <d...@nielwatts.com>.
>
>
> Robert Menschel wrote:
>
>> Hello Doc,
>>
>> Sunday, October 30, 2005, 10:36:39 AM, you wrote:
>> DS> Daniel,
>>
>> DS> I got a whole lot of these... if you could make that into a DS> 
>> 00_software_DW.cf file I can run it through the SARE masscheckers, if
>> DS> you want.
>>
>> If they're good enough, they can be added to Fred's OEM file.
>>
>> Daniel, I didn't see any "score" lines in the extract Doc sent to the
>> SARE list ... the default score on any "unscored" rule is 1.0, so you
>> need to be aware that if any non-spam matches say 3 of your rules,
>> it'll get a 3.0 score just from those rules.
>>
>> For testing you might want to set those rules to score 0.01 (leave the
>> meta as it is), and then in production change those rules to
>> __nonscore rules.
>>
>> Bob Menschel
>>
>>  
>>
> Very kind of you guys.
> I must admit i'm not 100% sure what you mean by your
> 00_software_DW.cf file but i've attached my best guess to this email!
> Daniel
>
>------------------------------------------------------------------------
>
>#test cf file to run against software spam email
>#Tries to match several attributes within the email
>#Written by Daniel Watts Oct 2005 and submitted to the spamassassin list for public assesment and use.
>
># matches Software in the subject
>header SOFTWARE_SPAM_SUBJECT Subject =~ /^Software *$/       
>
>#matches "New software on our site:" on it's own line
>body SOFTWARE_SPAM_BODY1 /^New software on our site: *$/
>
>#matches $xx.xx at least 10 times
>body SOFTWARE_SPAM_BODY2 /(\$\d{1,3}\.\d{0,2}){10,}/s
>
>
>#matches at least 10 hyphens with spaces round them
>body SOFTWARE_SPAM_BODY3 /( \- ){10,}/s               
>
>
>#matches "Our site:" on it's own line
>body SOFTWARE_SPAM_BODY4 /^Our site: *$/
>
>
>#matches url with 4 parts
>body SOFTWARE_SPAM_BODY5
>/http:\/\/(([a-zA-Z0-9]+[a-zA-Z0-9_-]*)\.){3,}([a-zA-Z0-9]+[a-zA-Z0-9_-]*)/
>
>
>
>meta SOFTWARE_SPAM (SOFTWARE_SPAM_SUBJECT && SOFTWARE_SPAM_BODY1 && SOFTWARE_SPAM_BODY2 && SOFTWARE_SPAM_BODY3 && SOFTWARE_SPAM_BODY4 && SOFTWARE_SPAM_BODY5)
>
>description SOFTWARE_SPAM Unsolicited message selling software
>
>#None of these should singly mark a message as spam
>SOFTWARE_SPAM_SUBJECT 0.01
>SOFTWARE_SPAM_BODY1 0.01
>SOFTWARE_SPAM_BODY2 0.01
>SOFTWARE_SPAM_BODY3 0.01
>SOFTWARE_SPAM_BODY4 0.01
>SOFTWARE_SPAM_BODY5 0.01
>
>#The collection is almost certainly spam
>score SOFTWARE_SPAM 10 
>

Sorry  noticed 2 typos.
Ammended file attached again.


Re: [Sare] Rule for the SOFTWARE spam

Posted by Daniel Watts <d...@nielwatts.com>.
Hi guys,

Wow this detail of feedback, recieved so fast, is great.
Evidently I didn't get it right the first time. Please bear with me - 
this is my first rule attempt.
I've spent another hour or so and I think I have it right this time.

Please if you would try the attached cf file against your mass checkers 
we should find a more positive result!
FILE: 01_software_DW.cf

Sincerely,
Daniel

Re: [Sare] Rule for the SOFTWARE spam

Posted by Daniel Watts <d...@nielwatts.com>.

Robert Menschel wrote:

>Hello Doc,
>
>Sunday, October 30, 2005, 10:36:39 AM, you wrote:
>DS> Daniel,
>
>DS> I got a whole lot of these... if you could make that into a 
>DS> 00_software_DW.cf file I can run it through the SARE masscheckers, if
>DS> you want.
>
>If they're good enough, they can be added to Fred's OEM file.
>
>Daniel, I didn't see any "score" lines in the extract Doc sent to the
>SARE list ... the default score on any "unscored" rule is 1.0, so you
>need to be aware that if any non-spam matches say 3 of your rules,
>it'll get a 3.0 score just from those rules.
>
>For testing you might want to set those rules to score 0.01 (leave the
>meta as it is), and then in production change those rules to
>__nonscore rules.
>
>Bob Menschel
>
>  
>
Very kind of you guys.
I must admit i'm not 100% sure what you mean by your
00_software_DW.cf file but i've attached my best guess to this email!
Daniel