You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bert Van de Poel <be...@ulyssis.org> on 2019/09/16 02:53:34 UTC
Custom rule aware of occurrences
Dear fellow Spamassassin users,
I'm contacting you as a member of ULYSSIS. ULYSSIS is a student
non-profit organisation at the University of Leuven trying to make
computers and technology more approachable and available to students. As
part of this objective, we run a hosting service within our university's
network for student organisations, student unions and individuals at our
university.
We've battled with spam from time to time, since we seem to attract a
lot of exotic languages which are rather well able to circumvent
commonly used methods. This has had us resort to some custom rulesets to
battle against mostly targetted French and SEO spam often coming from
very respectable servers and very normal addresses.
Now because SEO spam specifically has been adapting quite well to any
rule we think of (finding alternative ways of saying the same thing time
and time again), I was hoping to write a rule that basically boiled down
to "give some spam score to emails that contain the word SEO 3 or more
times" to push those already being detected by other rules over the
edge. To be clear, this will be a low score rule, I'm aware that ham can
perfectly well contain that word 3 times, just like this email for
example. Now while investigating I started wondering how to tackle that
some spam will just have a plain text body, while others will also
feature HTML, which means that suddenly the amount may double/half.
Beyond that it seems quite hacky to use a regex that boils down to
something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of something that is
properly aware of the count of certain words.
Since I sort of expected Spamassassin to have a solution for both the
text/text+html and the counting problems, I asked around on IRC but was
pointed here. So uhm, any suggestions or pointers are more than welcome.
Not too sure if any more information is required, but feel free to ask
questions or corect my presumptions if necessary.
Kind regards,
Bert Van de Poel
ULYSSIS
University of Leuven
Re: Custom rule aware of occurrences
Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/15/2019 10:53 PM, Bert Van de Poel wrote:
> Dear fellow Spamassassin users,
>
> I'm contacting you as a member of ULYSSIS. ULYSSIS is a student
> non-profit organisation at the University of Leuven trying to make
> computers and technology more approachable and available to students.
> As part of this objective, we run a hosting service within our
> university's network for student organisations, student unions and
> individuals at our university.
>
> We've battled with spam from time to time, since we seem to attract a
> lot of exotic languages which are rather well able to circumvent
> commonly used methods. This has had us resort to some custom rulesets
> to battle against mostly targetted French and SEO spam often coming
> from very respectable servers and very normal addresses.
>
> Now because SEO spam specifically has been adapting quite well to any
> rule we think of (finding alternative ways of saying the same thing
> time and time again), I was hoping to write a rule that basically
> boiled down to "give some spam score to emails that contain the word
> SEO 3 or more times" to push those already being detected by other
> rules over the edge. To be clear, this will be a low score rule, I'm
> aware that ham can perfectly well contain that word 3 times, just like
> this email for example. Now while investigating I started wondering
> how to tackle that some spam will just have a plain text body, while
> others will also feature HTML, which means that suddenly the amount
> may double/half. Beyond that it seems quite hacky to use a regex that
> boils down to something like /\bSEO\b.*\bSEO\b.*\bSEO\b/i instead of
> something that is properly aware of the count of certain words.
>
> Since I sort of expected Spamassassin to have a solution for both the
> text/text+html and the counting problems, I asked around on IRC but
> was pointed here. So uhm, any suggestions or pointers are more than
> welcome. Not too sure if any more information is required, but feel
> free to ask questions or corect my presumptions if necessary.
>
Bert, off the cuff, SA pretty readily handles things like this. What we
normally ask for is a sample of an email with all headers showing the
problem. Put it up on pastebin.com since it's likely to be blocked if
you email it.
you likely want a rule that looks for SEO and a multiple maxhits tflag.
You can look at http://www.mcgrail.com/downloads/KAM.cf for examples.
Regards,
KAM
--
Kevin A. McGrail
KMcGrail@Apache.org
Member, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171