You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ken Bass <kb...@kenbass.com> on 2014/10/15 22:49:32 UTC

SA skipping URI processing

I'm using Centos 7, which means SA version 3.3.2.

I am encountering several emails that are not being processed correctly 
when checking against URI rules.

1) My local.cf has a rule to address the new .link domain which spammers 
appear to be using recently:

uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
describe LR_LINK_TLD  Contains a URL in the LINK top-level domain
score LR_LINK_TLD     3.0

2) The URIDNSBL rules are not being executed for these email either.

Debug of SA shows an empty domains to query: Huh?
Oct 15 16:24:55.416 [15519] dbg: uridnsbl: domains to query:

Here is the pastebin link to the full spam email:

http://pastebin.com/RJWyGkKB


Re: SA skipping URI processing

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 10/15/2014 7:33 PM, Ken Bass wrote:
> On 10/15/2014 6:50 PM, Kevin A. McGrail wrote:
>> I'd have to dig into it to find out more but there are different 
>> modules used for different tests so deviation in behavior is not 
>> something that alarms me.  If you replace your RegistrarBoundaries.pm 
>> and it still has issues, please let us know. I am 99.9% sure I'm right.
>>
>> regards,
>> KAM
> Thanks -- My apologies for doubting you. Kinda of scary that there is 
> a loophole that will grow each time a new tld is introduced. For now, 
> I'll just block the .link domain at the smtp level.
I'm an engineer so Doubt is a good thing.  Trust but verify ;-)

But yes, we know the TLD issue is a growing pain point and we have some 
thoughts in progress to resolve it.


Re: SA skipping URI processing

Posted by Ken Bass <kb...@kenbass.com>.
On 10/15/2014 6:50 PM, Kevin A. McGrail wrote:
> I'd have to dig into it to find out more but there are different 
> modules used for different tests so deviation in behavior is not 
> something that alarms me.  If you replace your RegistrarBoundaries.pm 
> and it still has issues, please let us know. I am 99.9% sure I'm right.
>
> regards,
> KAM
Thanks -- My apologies for doubting you. Kinda of scary that there is a 
loophole that will grow each time a new tld is introduced. For now, I'll 
just block the .link domain at the smtp level.

Re: SA skipping URI processing

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 10/15/2014 6:20 PM, Ken Bass wrote:
> On 10/15/2014 6:12 PM, Martin Gregorie wrote:
>> I'm certain KAM is right and here's why.
> ...snip...
>> IOW, uri rules depend on matching the terminal part of the domain name
>> with an entry in SA's built-in TLD list and my version, installed from
>> the Fedora repo, doesn't yet include .link.
>>
>> I reverted my rules and test messages to test for the .link TLD and am
>> now waiting for a TLD list that contains .link to percolate through the
>> Fedora update process.
>>
>>
> I think my confusion is that for many spam messages, the uri rule is 
> working fine for the .link domain.
> After looking at some different spam emails, I think the difference is 
> that if the .link is inside an 'HTML' spam, the url processing works. 
> If it is a normal text spam email, the url processing does not work. 
> That has been the source of my confusion and why I was thinking KAM 
> was referring to a different issue.
>
> So I am thinking that the HTML decoding part of SA doesn't use that 
> built-in TLD list, but the test email processing does. That is the 
> only way I can explain it what I am seeing.

I'd have to dig into it to find out more but there are different modules 
used for different tests so deviation in behavior is not something that 
alarms me.  If you replace your RegistrarBoundaries.pm and it still has 
issues, please let us know. I am 99.9% sure I'm right.

regards,
KAM

Re: SA skipping URI processing

Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2014-10-15 at 18:20 -0400, Ken Bass wrote:
> On 10/15/2014 6:12 PM, Martin Gregorie wrote:
> > I'm certain KAM is right and here's why.
> ...snip...
> > IOW, uri rules depend on matching the terminal part of the domain name
> > with an entry in SA's built-in TLD list and my version, installed from
> > the Fedora repo, doesn't yet include .link.
> >
> > I reverted my rules and test messages to test for the .link TLD and am
> > now waiting for a TLD list that contains .link to percolate through the
> > Fedora update process.
> >
> >
> I think my confusion is that for many spam messages, the uri rule is 
> working fine for the .link domain.
> After looking at some different spam emails, I think the difference is 
> that if the .link is inside an 'HTML' spam, the url processing works. If 
> it is a normal text spam email, the url processing does not work. That 
> has been the source of my confusion and why I was thinking KAM was 
> referring to a different issue.
> 
> So I am thinking that the HTML decoding part of SA doesn't use that 
> built-in TLD list, but the test email processing does. That is the only 
> way I can explain it what I am seeing.
> 
That's quite possible. My test messages are all plaintext or have the
uris in plaintext  MIME parts.


Martin




Re: SA skipping URI processing

Posted by Ken Bass <kb...@kenbass.com>.
On 10/15/2014 6:12 PM, Martin Gregorie wrote:
> I'm certain KAM is right and here's why.
...snip...
> IOW, uri rules depend on matching the terminal part of the domain name
> with an entry in SA's built-in TLD list and my version, installed from
> the Fedora repo, doesn't yet include .link.
>
> I reverted my rules and test messages to test for the .link TLD and am
> now waiting for a TLD list that contains .link to percolate through the
> Fedora update process.
>
>
I think my confusion is that for many spam messages, the uri rule is 
working fine for the .link domain.
After looking at some different spam emails, I think the difference is 
that if the .link is inside an 'HTML' spam, the url processing works. If 
it is a normal text spam email, the url processing does not work. That 
has been the source of my confusion and why I was thinking KAM was 
referring to a different issue.

So I am thinking that the HTML decoding part of SA doesn't use that 
built-in TLD list, but the test email processing does. That is the only 
way I can explain it what I am seeing.

Re: SA skipping URI processing

Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2014-10-15 at 17:01 -0400, Ken Bass wrote:
> On 10/15/2014 4:52 PM, Kevin A. McGrail wrote:
> > On 10/15/2014 4:49 PM, Ken Bass wrote:
> >> 1) My local.cf has a rule to address the new .link domain which 
> >> spammers appear to be using recently:
> >>
> >> uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
> >> describe LR_LINK_TLD  Contains a URL in the LINK top-level domain
> >> score LR_LINK_TLD     3.0
> >>
> >> 2) The URIDNSBL rules are not being executed for these email either.
> >>
> >> Debug of SA shows an empty domains to query: Huh?
> >> Oct 15 16:24:55.416 [15519] dbg: uridnsbl: domains to query:
> >>
> >> Here is the pastebin link to the full spam email:
> >>
> >> http://pastebin.com/RJWyGkKB
> > The TLDs are hardcoded in SA 3.3.2.   We are working on not having 
> > them hard-coded in 3.4.1.
> >
> > I believe someone made a patch suitable for 3.3.2 but I can't find it 
> > at the moment.
> 
> Sorry but I think you might be confusing some specific TLD related rule 
> issues rather than the more generic custom uri rules and uridnsbl rules 
> that I am using. Because these work fine on OTHER emails. Something in 
> specific emails, like the one in the above pastebin are causing the 
> issue. I've got lots of other emails that hit the above LR_LINK_TLD  
> and/or URIBL_DBL_SPAM.
> 
I'm certain KAM is right and here's why.

: I recently wrote a set of three experimental rules to detect *.link
Rules in body text, Received headers and From headers and set up some
test messages since I've yet to see any .link TLDs . The body text rule
was, of course, a URI rule. It didn't work though the other two rules,
which used ordinary regexes with \.link as part of the expression,
worked as expected. Eventually, as a debugging aid I changed the rules
and the test messages to search for \.com and all three rules worked as
expected. 

IOW, uri rules depend on matching the terminal part of the domain name
with an entry in SA's built-in TLD list and my version, installed from
the Fedora repo, doesn't yet include .link. 

I reverted my rules and test messages to test for the .link TLD and am
now waiting for a TLD list that contains .link to percolate through the
Fedora update process.


HTH
Martin




Re: SA skipping URI processing

Posted by Ken Bass <kb...@kenbass.com>.
On 10/15/2014 4:52 PM, Kevin A. McGrail wrote:
> On 10/15/2014 4:49 PM, Ken Bass wrote:
>> 1) My local.cf has a rule to address the new .link domain which 
>> spammers appear to be using recently:
>>
>> uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
>> describe LR_LINK_TLD  Contains a URL in the LINK top-level domain
>> score LR_LINK_TLD     3.0
>>
>> 2) The URIDNSBL rules are not being executed for these email either.
>>
>> Debug of SA shows an empty domains to query: Huh?
>> Oct 15 16:24:55.416 [15519] dbg: uridnsbl: domains to query:
>>
>> Here is the pastebin link to the full spam email:
>>
>> http://pastebin.com/RJWyGkKB
> The TLDs are hardcoded in SA 3.3.2.   We are working on not having 
> them hard-coded in 3.4.1.
>
> I believe someone made a patch suitable for 3.3.2 but I can't find it 
> at the moment.

Sorry but I think you might be confusing some specific TLD related rule 
issues rather than the more generic custom uri rules and uridnsbl rules 
that I am using. Because these work fine on OTHER emails. Something in 
specific emails, like the one in the above pastebin are causing the 
issue. I've got lots of other emails that hit the above LR_LINK_TLD  
and/or URIBL_DBL_SPAM.



Re: SA skipping URI processing

Posted by Ken Bass <kb...@kenbass.com>.
On 10/15/2014 4:52 PM, Kevin A. McGrail wrote:
> The TLDs are hardcoded in SA 3.3.2.   We are working on not having 
> them hard-coded in 3.4.1.
I found Bug 6782, which I think you are referring to. I don't quite 
understand the details of it. But are saying that the 'uri' and uridnsbl 
rules
rely on those functions? If so, I am confused, because I have many spam 
emails with the '.link' domain that are being tagged properly.

Re: SA skipping URI processing

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 10/15/2014 4:49 PM, Ken Bass wrote:
> I'm using Centos 7, which means SA version 3.3.2.
>
> I am encountering several emails that are not being processed 
> correctly when checking against URI rules.
>
> 1) My local.cf has a rule to address the new .link domain which 
> spammers appear to be using recently:
>
> uri LR_LINK_TLD /^(?:https?:\/\/|mailto:)[^\/]+\.link(?:\/|$)/i
> describe LR_LINK_TLD  Contains a URL in the LINK top-level domain
> score LR_LINK_TLD     3.0
>
> 2) The URIDNSBL rules are not being executed for these email either.
>
> Debug of SA shows an empty domains to query: Huh?
> Oct 15 16:24:55.416 [15519] dbg: uridnsbl: domains to query:
>
> Here is the pastebin link to the full spam email:
>
> http://pastebin.com/RJWyGkKB
The TLDs are hardcoded in SA 3.3.2.   We are working on not having them 
hard-coded in 3.4.1.

I believe someone made a patch suitable for 3.3.2 but I can't find it at 
the moment.

regards,
KAM