You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Shane Williams <sh...@shanew.net> on 2020/07/14 22:02:46 UTC

Negative lookbehind in URIs?

I'm looking to detect a mismatch between the domain in the href
property of a URI and a domain in the anchor text itself.  It seems
like this is the right place for a negative lookbehind, and I don't
mind writing my own rule, but I can't help thinking that this has been
solved already.  Searching the list for lookbehind comes up with a
couple of instances of people getting errors (about a variable length
lookbehind), but I'm not finding anything like what I'm looking for.

Does anyone have a sample rule for this, or other suggestions on how
to detect this is in SA (maybe a plugin)?

-- 
Public key #7BBC68D9 at            |                 Shane Williams
http://pgp.mit.edu/                |      System Admin - UT CompSci
=----------------------------------+-------------------------------
All syllogisms contain three lines |              shanew@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Negative lookbehind in URIs?

Posted by Pedro David Marco <pe...@yahoo.com>.
 Nice Loren....
nowadays with uri_detail this is easily solved with something like
uri_detail          HTTPS_HTTP_MISMATCH     text =~ /^https:\/\//i     cleaned =~ /^http:\/\//iscore                 HTTPS_HTTP_MISMATCH     0.5describe            HTTPS_HTTP_MISMATCH     URL claims to use SSL but it does not


---------Pedro

   >On Wednesday, July 15, 2020, 02:20:34 AM GMT+2, Loren Wilton <lw...@earthlink.net> wrote:  
 > I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.   >Not using lookbehind, but I long ago wrote these two rules to look for similar situations. Either could be modified fairly easily to do what you want.

>Note: these are probably around 10 years old, written before there were URI rules (if I remember correctly) so there may be more efficient ways to do these these days.         Loren

>#check for attempting to phish
>rawbody __LW_PHISH_2   m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
>full    __LW_PHISH_2a  m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
>meta    LW_PHISH_2     __LW_PHISH_2 || __LW_PHISH_2a
>score   LW_PHISH_2      50
>describe LW_PHISH_2    numeric href with https description
>#score   __LW_PHISH_2  1
>#score   __LW_PHISH_2a 1
>rawbody  __LW_PHISH_3  /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
>full     __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
>meta     LW_PHISH_3    __LW_PHISH_3 || __LW_PHISH_3a
>score    LW_PHISH_3    50
>describe LW_PHISH_3    secure description with insecure link
>#score   __LW_PHISH_3  10
>#score   __LW_PHISH_3a 1  

Re: Negative lookbehind in URIs?

Posted by Loren Wilton <lw...@earthlink.net>.
> There are rough equivalents to these in the current default rules: 
> HTTPS_IP_MISMATCH and HTTPS_HTTP_MISMATCH.

I'm not surprised. Those were my original rules, which became SARE rules, 
and a number of those still exist under different names.

        Loren


Re: Negative lookbehind in URIs?

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 14 Jul 2020, at 20:20, Loren Wilton wrote:

>> I'm looking to detect a mismatch between the domain in the href
>> property of a URI and a domain in the anchor text itself.
>
> Not using lookbehind, but I long ago wrote these two rules to look for 
> similar situations. Either could be modified fairly easily to do what 
> you want.
>
> Note: these are probably around 10 years old, written before there 
> were URI rules (if I remember correctly) so there may be more 
> efficient ways to do these these days.
>
>         Loren
>
> #check for attempting to phish
> rawbody __LW_PHISH_2   
> m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
> full    __LW_PHISH_2a  
> m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
> meta    LW_PHISH_2     __LW_PHISH_2 || __LW_PHISH_2a
> score   LW_PHISH_2      50
> describe LW_PHISH_2    numeric href with https description
> #score   __LW_PHISH_2  1
> #score   __LW_PHISH_2a 1
>
> rawbody  __LW_PHISH_3  /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
> full     __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
> meta     LW_PHISH_3    __LW_PHISH_3 || __LW_PHISH_3a
> score    LW_PHISH_3    50
> describe LW_PHISH_3    secure description with insecure link
> #score   __LW_PHISH_3  10
> #score   __LW_PHISH_3a 1

There are rough equivalents to these in the current default rules: 
HTTPS_IP_MISMATCH and HTTPS_HTTP_MISMATCH.


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not For Hire (currently)

Re: Negative lookbehind in URIs?

Posted by Loren Wilton <lw...@earthlink.net>.
> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.  

Not using lookbehind, but I long ago wrote these two rules to look for similar situations. Either could be modified fairly easily to do what you want.

Note: these are probably around 10 years old, written before there were URI rules (if I remember correctly) so there may be more efficient ways to do these these days.

        Loren

#check for attempting to phish
rawbody __LW_PHISH_2   m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
full    __LW_PHISH_2a  m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
meta    LW_PHISH_2     __LW_PHISH_2 || __LW_PHISH_2a
score   LW_PHISH_2      50
describe LW_PHISH_2    numeric href with https description
#score   __LW_PHISH_2  1
#score   __LW_PHISH_2a 1

rawbody  __LW_PHISH_3  /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
full     __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
meta     LW_PHISH_3    __LW_PHISH_3 || __LW_PHISH_3a
score    LW_PHISH_3    50
describe LW_PHISH_3    secure description with insecure link
#score   __LW_PHISH_3  10
#score   __LW_PHISH_3a 1

Re: Negative lookbehind in URIs?

Posted by Laurent S <11...@protonmail.ch>.
Dear Shane,

Have you had a look at the uri_detail plugin? You should find 
interesting info there:

perldoc Mail::SpamAssassin::Plugin::URIDetail

I guess you should be able to do what you want with this plugin. But I 
rarely use it, so I can't help you further.

In order to catch those mismatch that you mention, I rather use the 
phish sigs from ClamAV, which is very convenient to use.

https://www.clamav.net/documents/phishsigs

Lastly, as Bill Cole mentioned, you will have a lot of false positives. 
You should curate a list of commonly abused URI and only try to catch 
those. There are too many ESP rewriting links (for tracking purposes)... 
There are even banks using those ESP...

Best,

Laurent

On 15.07.20 00:02, Shane Williams wrote:
> 
> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.  It seems
> like this is the right place for a negative lookbehind, and I don't
> mind writing my own rule, but I can't help thinking that this has been
> solved already.  Searching the list for lookbehind comes up with a
> couple of instances of people getting errors (about a variable length
> lookbehind), but I'm not finding anything like what I'm looking for.
> 
> Does anyone have a sample rule for this, or other suggestions on how
> to detect this is in SA (maybe a plugin)?
> 
> --
> Public key #7BBC68D9 at            |                 Shane Williams
> http://pgp.mit.edu/                |      System Admin - UT CompSci
> =----------------------------------+-------------------------------
> All syllogisms contain three lines |              shanew@shanew.net
> Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew
> 


Re: Negative lookbehind in URIs?

Posted by Pedro David Marco <pe...@yahoo.com>.
 
Bill, Shane...

we do that with a plugin becasue exceptions must be considered...  for example to avoid false positives with rewrited URLs  (used by some companies)

-----Pedro.

  

Re: Negative lookbehind in URIs?

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 14 Jul 2020, at 18:02, Shane Williams wrote:

> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.

That will match a lot of ham. I'm not saying that it is a bad rule but 
it would probably need to be a component in meta-rules to be useful.

> It seems
> like this is the right place for a negative lookbehind, and I don't
> mind writing my own rule, but I can't help thinking that this has been
> solved already.  Searching the list for lookbehind comes up with a
> couple of instances of people getting errors (about a variable length
> lookbehind), but I'm not finding anything like what I'm looking for.
>
> Does anyone have a sample rule for this, or other suggestions on how
> to detect this is in SA (maybe a plugin)?

I'm also somewhat surprised to find that it's not there already, but 
indeed it is not.

If you come up with a good rule for it, PLEASE share it here. If you 
want it tested in RuleQA, I'd be happy to drop a candidate rule into my 
testing sandbox.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not For Hire (currently)