You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bert Van de Poel <be...@ulyssis.org> on 2019/09/16 02:53:34 UTC

Custom rule aware of occurrences

Dear fellow Spamassassin users,

I'm contacting you as a member of ULYSSIS. ULYSSIS is a student 
non-profit organisation at the University of Leuven trying to make 
computers and technology more approachable and available to students. As 
part of this objective, we run a hosting service within our university's 
network for student organisations, student unions and individuals at our 
university.

We've battled with spam from time to time, since we seem to attract a 
lot of exotic languages which are rather well able to circumvent 
commonly used methods. This has had us resort to some custom rulesets to 
battle against mostly targetted French and SEO spam often coming from 
very respectable servers and very normal addresses.

Now because SEO spam specifically has been adapting quite well to any 
rule we think of (finding alternative ways of saying the same thing time 
and time again), I was hoping to write a rule that basically boiled down 
to "give some spam score to emails that contain the word SEO 3 or more 
times" to push those already being detected by other rules over the 
edge. To be clear, this will be a low score rule, I'm aware that ham can 
perfectly well contain that word 3 times, just like this email for 
example. Now while investigating I started wondering how to tackle that 
some spam will just have a plain text body, while others will also 
feature HTML, which means that suddenly the amount may double/half. 
Beyond that it seems quite hacky to use a regex that boils down to 
something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of something that is 
properly aware of the count of certain words.

Since I sort of expected Spamassassin to have a solution for both the 
text/text+html and the counting problems, I asked around on IRC but was 
pointed here. So uhm, any suggestions or pointers are more than welcome. 
Not too sure if any more information is required, but feel free to ask 
questions or corect my presumptions if necessary.

Kind regards,
Bert Van de Poel
ULYSSIS
University of Leuven


Re: Custom rule aware of occurrences

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/15/2019 10:53 PM, Bert Van de Poel wrote:
> Dear fellow Spamassassin users,
>
> I'm contacting you as a member of ULYSSIS. ULYSSIS is a student
> non-profit organisation at the University of Leuven trying to make
> computers and technology more approachable and available to students.
> As part of this objective, we run a hosting service within our
> university's network for student organisations, student unions and
> individuals at our university.
>
> We've battled with spam from time to time, since we seem to attract a
> lot of exotic languages which are rather well able to circumvent
> commonly used methods. This has had us resort to some custom rulesets
> to battle against mostly targetted French and SEO spam often coming
> from very respectable servers and very normal addresses.
>
> Now because SEO spam specifically has been adapting quite well to any
> rule we think of (finding alternative ways of saying the same thing
> time and time again), I was hoping to write a rule that basically
> boiled down to "give some spam score to emails that contain the word
> SEO 3 or more times" to push those already being detected by other
> rules over the edge. To be clear, this will be a low score rule, I'm
> aware that ham can perfectly well contain that word 3 times, just like
> this email for example. Now while investigating I started wondering
> how to tackle that some spam will just have a plain text body, while
> others will also feature HTML, which means that suddenly the amount
> may double/half. Beyond that it seems quite hacky to use a regex that
> boils down to something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of
> something that is properly aware of the count of certain words.
>
> Since I sort of expected Spamassassin to have a solution for both the
> text/text+html and the counting problems, I asked around on IRC but
> was pointed here. So uhm, any suggestions or pointers are more than
> welcome. Not too sure if any more information is required, but feel
> free to ask questions or corect my presumptions if necessary.
>
Bert, off the cuff, SA pretty readily handles things like this.  What we
normally ask for is a sample of an email with all headers showing the
problem.  Put it up on pastebin.com since it's likely to be blocked if
you email it.

you likely want a rule that looks for SEO and a multiple maxhits tflag. 
You can look at http://www.mcgrail.com/downloads/KAM.cf for examples.

Regards,

KAM

-- 
Kevin A. McGrail
KMcGrail@Apache.org

Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171