You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2018/01/14 01:28:29 UTC

Using fuzzy patterns

Hi,

I don't think I fully understand how to use the fuzzy rules with a proper regex:

From: "F*e dE x" <fe...@speedpost.com>

That address hardly resembles "Fed Ex", but how general of a rule can
we create and still catch variations such as this?

I thought something like this would work:

header    FUZZY_FEDEX   From =~
/(?!f.?e.?d.{0,3}e.?x)<F>.?<E>.?<D>.{0,3}<E>.?<X>/i

Re: Using fuzzy patterns

Posted by Alex <my...@gmail.com>.
Hi,

On Sun, Jan 14, 2018 at 7:58 AM, David Jones <dj...@ena.com> wrote:
> On 01/14/2018 01:45 AM, Rupert Gallagher wrote:
>>
>> Good question!
>>
>> One may write the regex backwards: if it matches "fedex" in the address,
>> but does not match "FedEx" in the name, then... However, there are many
>> cases where this will fail or return false positives.
>>
>> One may say that fedex is a brand name that only fedex can use, so if the
>> pattern matches anywhere in the From string (comment and address), and the
>> last Received from IP is not in fedex's spf, then it is spam. This will
>> catch fishes like
>>
>> From: "FedEx invoices invoices@fedex.com" <fo...@example.com>
>>
>
> I have put fedex.com in 60_whitelist_auth.cf so you should be seeing legit
> email from Fedex scoring very low.  Create local rules to add points to
> "fedex" and other strings you find from spoofing.

Yes, I'm doing that here locally, but was just interested in these
edge cases where "fedex" is almost nearly completely obscured that
rules adding points for "fedex" don't match these cases, yet readable
enough by a human that my users will notice (and complain, as always).

I was also interested in understanding more about these fuzzy regex
rules and how to use them to my advantage.

Re: Using fuzzy patterns

Posted by David Jones <dj...@ena.com>.
On 01/14/2018 01:45 AM, Rupert Gallagher wrote:
> Good question!
> 
> One may write the regex backwards: if it matches "fedex" in the address, 
> but does not match "FedEx" in the name, then... However, there are many 
> cases where this will fail or return false positives.
> 
> One may say that fedex is a brand name that only fedex can use, so if 
> the pattern matches anywhere in the From string (comment and 
> address), and the last Received from IP is not in fedex's spf, then it 
> is spam. This will catch fishes like
> 
> From: "FedEx invoices invoices@fedex.com" <fo...@example.com>
> 

I have put fedex.com in 60_whitelist_auth.cf so you should be seeing 
legit email from Fedex scoring very low.  Create local rules to add 
points to "fedex" and other strings you find from spoofing.

> 
> On Sun, Jan 14, 2018 at 02:28, Alex <mysqlstudent@gmail.com 
> <ma...@gmail.com>> wrote:
>> Hi, I don't think I fully understand how to use the fuzzy rules with a 
>> proper regex: From: "F*e dE x" That address hardly resembles "Fed Ex", 
>> but how general of a rule can we create and still catch variations 
>> such as this? I thought something like this would work: header 
>> FUZZY_FEDEX From =~ /(?!f.?e.?d.{0,3}e.?x) .? .? .{0,3} .? /i 


-- 
David Jones

Re: Using fuzzy patterns

Posted by Rupert Gallagher <ru...@protonmail.com>.
Good question!

One may write the regex backwards: if it matches "fedex" in the address, but does not match "FedEx" in the name, then... However, there are many cases where this will fail or return false positives.

One may say that fedex is a brand name that only fedex can use, so if the pattern matches anywhere in the From string (comment and address), and the last Received from IP is not in fedex's spf, then it is spam. This will catch fishes like

From: "FedEx invoices invoices@fedex.com" <fo...@example.com>

Sent from ProtonMail Mobile

On Sun, Jan 14, 2018 at 02:28, Alex <my...@gmail.com> wrote:

> Hi, I don't think I fully understand how to use the fuzzy rules with a proper regex: From: "F*e dE x"  That address hardly resembles "Fed Ex", but how general of a rule can we create and still catch variations such as this? I thought something like this would work: header FUZZY_FEDEX From =~ /(?!f.?e.?d.{0,3}e.?x) .? .? .{0,3} .? /i @speedpost.com>

Re: Using fuzzy patterns

Posted by sh...@shanew.net.
On Sat, 13 Jan 2018, Alex wrote:

> From: "F*e dE x" <fe...@speedpost.com>
>
> That address hardly resembles "Fed Ex", but how general of a rule can
> we create and still catch variations such as this?
>
> I thought something like this would work:
>
> header    FUZZY_FEDEX   From =~
> /(?!f.?e.?d.{0,3}e.?x)<F>.?<E>.?<D>.{0,3}<E>.?<X>/i

To fully debug this, I think we need to know the replace_tag
definitions you've set for these characters.  That said, the first
thing I notice is that the negative lookahead pattern matches your
From header (twice, I think).  This means that no matter what follows,
this rule will not trigger.  I suspect you want the negative lookahead
to be more strictly correct, like "(?!fed ex)".

You may also want to use "From:name =~" to limit the search to the
non-address portion of the header.


-- 
Public key #7BBC68D9 at            |                 Shane Williams
http://pgp.mit.edu/                |      System Admin - UT CompSci
=----------------------------------+-------------------------------
All syllogisms contain three lines |              shanew@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew