You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael Grant <mi...@gmail.com> on 2018/09/22 21:55:49 UTC

using URIBL on other headers

The URIBL plugin looks for URLs in the subject and message body.

Is there some way to coax it to look in the other headers as well, for
example the From: Reply-to: or the Received headers?

Re: using URIBL on other headers

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/26/2018 10:59 AM, Pedro David Marco wrote:
>
> On Sunday, September 23, 2018, 12:55:28 AM GMT+2, Kevin A. McGrail
> <km...@apache.org> wrote:
>
> >It's fractured.  There are various lookups in various states in
> various plugins.
>
> >From, Reply-to, Received, nameservers, rdns, webmail server headers,
> >etc. are all enhancements I want to add for RBL lookups.  Some sort of
> >generic Header lookup would be best.  I can't remember if I have a
> >bugzilla for this but I have a lot of private notes about it.
>
> "Generic header", Kevin... would be much better so SA can check URLs
> added by any external software in a specific header
> that is removed before email delivery...
Agreed.

Re: using URIBL on other headers

Posted by Pedro David Marco <pe...@yahoo.com>.
 
    On Sunday, September 23, 2018, 12:55:28 AM GMT+2, Kevin A. McGrail <km...@apache.org> wrote:  
 >It's fractured.  There are various lookups in various states in various plugins.
>From, Reply-to, Received, nameservers, rdns, webmail server headers,
>etc. are all enhancements I want to add for RBL lookups.  Some sort of
>generic Header lookup would be best.  I can't remember if I have a
>bugzilla for this but I have a lot of private notes about it.

"Generic header", Kevin... would be much better so SA can check URLs added by any external software in a specific headerthat is removed before email delivery...
-----PedroD  

Re: using URIBL on other headers

Posted by RW <rw...@googlemail.com>.
On Sun, 23 Sep 2018 20:37:48 +0100
Michael Grant wrote:


> I tried to read through the plugin.  I'm not a spamassassin plugin
> developer, I didn't have much luck trying to figure out how to do it
> myself.  I know this plugin only does subject and body but I saw
> nothing in the plugin itself that referenced the subject header.
> arbitrary header through this like the subject and body.

The subject text is the first paragraph of the normalized body which
is parsed for domains.

> I am not sure you need to do that.  Why not just run all the headers
> or rather the entire message including headers through this plugin
> just like the body, in fact, just extend it's scope to look at the
> entire message rather than just the body & subject.

Most emails don't have a domain in the body, so if you start adding
a lot of domains from the headers, the number of look-ups could increase
dramatically. It could push some mail servers beyond the usage limits.

The main point of URI blocklists is to catch the website that's the
point of contact with the spammer. I think it going to be pretty
rare for a listed domain to appear in the headers without its being in
the body. That was my experience with my askdns rules.

The from header is already largely covered by the parse_dkim_uris
option. Reply-to might be worth trying, but  most of the interesting
reply-to addresses are Freemail.

Re: using URIBL on other headers

Posted by Michael Grant <mi...@gmail.com>.
On Sat, 22 Sep 2018 at 23:55, Kevin A. McGrail <km...@apache.org> wrote:

> On 9/22/2018 5:55 PM, Michael Grant wrote:
> > The URIBL plugin looks for URLs in the subject and message body.
> >
> > Is there some way to coax it to look in the other headers as well, for
> > example the From: Reply-to: or the Received headers?
> >
> >
> It's fractured.  There are various lookups in various states in various
> plugins.
>
> From, Reply-to, Received, nameservers, rdns, webmail server headers,
> etc. are all enhancements I want to add for RBL lookups.  Some sort of
> generic Header lookup would be best.  I can't remember if I have a
> bugzilla for this but I have a lot of private notes about it.
>
>
Thanks Kevin, good to hear other folks and yourself wants this too, it sees
to make sense!

I tried to read through the plugin.  I'm not a spamassassin plugin
developer, I didn't have much luck trying to figure out how to do it
myself.  I know this plugin only does subject and body but I saw nothing in
the plugin itself that referenced the subject header.  So I am gathering
it's more complex than simply running the output of an arbitrary header
through this like the subject and body.

Is this difficult because you feel you need to parse out domain names from
all these fields?

I am not sure you need to do that.  Why not just run all the headers or
rather the entire message including headers through this plugin just like
the body, in fact, just extend it's scope to look at the entire message
rather than just the body & subject.

Just a thought.  Hopefully if it's really that easy or if you can tell me
how to extend the scope of this to encompass the entire message, we could
do this sooner than later!

Thanks for your excellent plugin by the way!

Michael Grant

Re: using URIBL on other headers

Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/22/2018 5:55 PM, Michael Grant wrote:
> The URIBL plugin looks for URLs in the subject and message body.
>
> Is there some way to coax it to look in the other headers as well, for
> example the From: Reply-to: or the Received headers?
>
>
It's fractured.  There are various lookups in various states in various
plugins.

From, Reply-to, Received, nameservers, rdns, webmail server headers,
etc. are all enhancements I want to add for RBL lookups.  Some sort of
generic Header lookup would be best.  I can't remember if I have a
bugzilla for this but I have a lot of private notes about it.

Regards,

KAM

-- 
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171


Re: using URIBL on other headers

Posted by Rob McEwen <ro...@invaluement.com>.
On 9/22/2018 5:55 PM, Michael Grant wrote:
> The URIBL plugin looks for URLs in the subject and message body.
> Is there some way to coax it to look in the other headers as well, for 
> example the From: Reply-to: or the Received headers?


Michael,

This reminds me of that saying, "just because you can, doesn't mean you 
should" - and along those lines, I have some interesting observations 
about this:

(1) some URI/domain blacklists are ONLY intended for blocking on the 
domain or IP that is at the base of clickable links inside the body of 
the message. These will often have a small (but critical) uptick in 
false positives if used to check against domains found in the SMTP 
envelope (FROM, PTR record, HELO), with typically a very small increase 
in additional spams blocked. SO BE CAREFUL -AND- if you use a URI/domain 
blacklist in that way and they don't prescribe that type of usage, don't 
complain to them or anyone about any resulting false positives - because 
it would then be your MIS-usage off their list that caused those false 
positive.

(2) Even so, there really are SOME series of spams that can be safely 
blocked based on domains that are in the SMTP envelope (FROM, PTR 
record, HELO). In some cases, these are snowshoe spammers who are 
sending from their own spammy domain - but where this domain is NOT 
found in a clickable link inside the body of the message - they really 
are trying to get the user to hit "reply". So there really is a purpose 
for this, even if it is is a very small percentage of all spam

(3) However, even with that being a very small percentage of all - LARGE 
mail hosters LOVE THIS IDEA? Why? Because it is SO EFFICIENT for them to 
be able to block MORE spam based on information in the SMTP envelope - 
BEFORE the "data" command. Sometimes, this helps block messages where 
the domain was in a clickable link inside the body of the message - but 
it is still MORE EFFICIENT to block that based on the domain also being 
in the SMTP envelop.

(4) ABOUT THOSE FALSE POSITIVES: One of the main reasons that this is so 
risky for False Positives... is because two things are epidemic in 
recent years: (a) web site gets hijacked by criminal spammer, who 
installed pages there that redirect to pornographic dating sites or pill 
spam websites -AND/OR- (b) email account on the mail server gets 
credentials hijacked and starts spewing spam. HERE IS THE PROBLEM: 
*MOST* of the time, one or the other happens, (a or b) but not both. 
Therefore, if (a) happens, they are sure to land on traditional URI 
blacklists like SURBL, URIBL, and ivmURI. But this company - whose web 
site was hacked - might not have a single spam coming from their mail 
server. Yet, if you do the SMTP envelope checking against such URI 
blacklists - you're going to have a substantially higher amount of false 
positives due to blocking ALL of those emails that merely have a "FROM" 
address ending in that domain name - even though NONE of THOSE messages 
are spam.

(5) So which lists *DO* support blocking on the SMTP envelope? Spamhaus' 
DBL list is designed for this. However, invaluement's ivmURI list is NOT 
supposed to be used in this manner. SURBL and URIBL were originally 
designed to not be used in this way - but that might have changed in 
recent years? I recommend checking on that. In the meantime, I recommend 
*ONLY* using Spamhaus' DBL list in this way. (possibly SURBL or URIBL 
too? - but double check on that!)

(6) QUESTION: So why would a list not support both blocking methods? For 
example, why wouldn't ivmURI support this method?

ANSWER: What Spamhaus did with DBL, while interesting, put them at a 
strategic disadvantage, and there isn't a thing they can do about that 
without making fundamental changes to their strategy. Recall that false 
positive scenario mentioned earlier, where a hacked web site causing a 
URI-list blacklisting can lead to substantially more false positives due 
to only hitting on legit mails when blocking based on this domain being 
in the SMTP envelope? Well.. the OPPOSITE situation ALSO causes more 
false positives. When their email system has a hijacked email account, 
but their web site was NOT hacked - then domain blacklists that 
prescribe BOTH blocking methods and blacklist that domain... are going 
to then start blocking ALL messages that have that domain as a hyperlink 
inside the body of the message, even if THOSE messages are legit. This 
will then cause a substantial number of false positives that were not 
part of those hijacked outbound messages. So this works both ways. The 
problem with such domain blacklists that prescribe both uses... is that 
they either have to settle for (a) more false positives -OR- (b) more 
false negatives. In other words, the higher collateral damage potential 
means that there is going to be more collateral damage when they "take 
the bait" and blacklist the domain -OR- their desire to limit false 
positives will cause them to defer on the listing - even though it would 
have been an excellent and justified ratio of spam-to-ham blocked, with 
little collateral damage if the mail systems using that list could have 
ONLY blocked using one method or the other, NOT both! DBL likely errs on 
the side of less collateral damage - so it is should be safe to use DBL 
for blocking based on both methods, as they prescribe, especially 
considering Spamhaus' reputation for extremely low false positives. 
Then, other URI lists can pick up the slack on the occasional False 
Negatives.

(7) Given this information, at invaluement, we have solved this problem 
by creating a new domain blacklist ("ivmSED") that is independent of 
ivmURI, where ivmSED is a domain blacklist used ONLY for blocking based 
on the domains found on the SMTP envelope (FROM, PTR record, HELO) - and 
where ivmSED NOT be used for blocking domains in clickable links in the 
body of the message, since that is the job of our ivmURI list. That way, 
ivmSED and ivmURI are independent, and we then have the flexibility to 
block a domain using either method independently, or both together, for 
the approach that most surgically targets the spam, keeps collateral 
damage to a minimum, and without compromises that lead to more false 
negatives. ivmSED has just recently entering beta testing. (SED = 
"Sender's Envelope Domain").

-- 
Rob McEwen
https://www.invaluement.com



Re: using URIBL on other headers

Posted by RW <rw...@googlemail.com>.
On Sat, 22 Sep 2018 22:55:49 +0100
Michael Grant wrote:

> The URIBL plugin looks for URLs in the subject and message body.
> 
> Is there some way to coax it to look in the other headers as well, for
> example the From: Reply-to: or the Received headers?

You can create individual rules for "From:" like:

 askdns  AUTHOR_IN_URIBL_BLACK  _AUTHORDOMAIN_.multi.uribl.com  A 2

However since I wrote this I've had 25 hits compared with 940 for
URIBL_BLACK. Only 4 spams hit AUTHOR_IN_URIBL_BLACK without URIBL_BLACK.