You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael Grant <mi...@gmail.com> on 2018/09/22 21:55:49 UTC
using URIBL on other headers
The URIBL plugin looks for URLs in the subject and message body.
Is there some way to coax it to look in the other headers as well, for
example the From: Reply-to: or the Received headers?
Re: using URIBL on other headers
Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/26/2018 10:59 AM, Pedro David Marco wrote:
>
> On Sunday, September 23, 2018, 12:55:28 AM GMT+2, Kevin A. McGrail
> <km...@apache.org> wrote:
>
> >It's fractured. There are various lookups in various states in
> various plugins.
>
> >From, Reply-to, Received, nameservers, rdns, webmail server headers,
> >etc. are all enhancements I want to add for RBL lookups. Some sort of
> >generic Header lookup would be best. I can't remember if I have a
> >bugzilla for this but I have a lot of private notes about it.
>
> "Generic header", Kevin... would be much better so SA can check URLs
> added by any external software in a specific header
> that is removed before email delivery...
Agreed.
Re: using URIBL on other headers
Posted by Pedro David Marco <pe...@yahoo.com>.
On Sunday, September 23, 2018, 12:55:28 AM GMT+2, Kevin A. McGrail <km...@apache.org> wrote:
>It's fractured. There are various lookups in various states in various plugins.
>From, Reply-to, Received, nameservers, rdns, webmail server headers,
>etc. are all enhancements I want to add for RBL lookups. Some sort of
>generic Header lookup would be best. I can't remember if I have a
>bugzilla for this but I have a lot of private notes about it.
"Generic header", Kevin... would be much better so SA can check URLs added by any external software in a specific headerthat is removed before email delivery...
-----PedroD
Re: using URIBL on other headers
Posted by RW <rw...@googlemail.com>.
On Sun, 23 Sep 2018 20:37:48 +0100
Michael Grant wrote:
> I tried to read through the plugin. I'm not a spamassassin plugin
> developer, I didn't have much luck trying to figure out how to do it
> myself. I know this plugin only does subject and body but I saw
> nothing in the plugin itself that referenced the subject header.
> arbitrary header through this like the subject and body.
The subject text is the first paragraph of the normalized body which
is parsed for domains.
> I am not sure you need to do that. Why not just run all the headers
> or rather the entire message including headers through this plugin
> just like the body, in fact, just extend it's scope to look at the
> entire message rather than just the body & subject.
Most emails don't have a domain in the body, so if you start adding
a lot of domains from the headers, the number of look-ups could increase
dramatically. It could push some mail servers beyond the usage limits.
The main point of URI blocklists is to catch the website that's the
point of contact with the spammer. I think it going to be pretty
rare for a listed domain to appear in the headers without its being in
the body. That was my experience with my askdns rules.
The from header is already largely covered by the parse_dkim_uris
option. Reply-to might be worth trying, but most of the interesting
reply-to addresses are Freemail.
Re: using URIBL on other headers
Posted by Michael Grant <mi...@gmail.com>.
On Sat, 22 Sep 2018 at 23:55, Kevin A. McGrail <km...@apache.org> wrote:
> On 9/22/2018 5:55 PM, Michael Grant wrote:
> > The URIBL plugin looks for URLs in the subject and message body.
> >
> > Is there some way to coax it to look in the other headers as well, for
> > example the From: Reply-to: or the Received headers?
> >
> >
> It's fractured. There are various lookups in various states in various
> plugins.
>
> From, Reply-to, Received, nameservers, rdns, webmail server headers,
> etc. are all enhancements I want to add for RBL lookups. Some sort of
> generic Header lookup would be best. I can't remember if I have a
> bugzilla for this but I have a lot of private notes about it.
>
>
Thanks Kevin, good to hear other folks and yourself wants this too, it sees
to make sense!
I tried to read through the plugin. I'm not a spamassassin plugin
developer, I didn't have much luck trying to figure out how to do it
myself. I know this plugin only does subject and body but I saw nothing in
the plugin itself that referenced the subject header. So I am gathering
it's more complex than simply running the output of an arbitrary header
through this like the subject and body.
Is this difficult because you feel you need to parse out domain names from
all these fields?
I am not sure you need to do that. Why not just run all the headers or
rather the entire message including headers through this plugin just like
the body, in fact, just extend it's scope to look at the entire message
rather than just the body & subject.
Just a thought. Hopefully if it's really that easy or if you can tell me
how to extend the scope of this to encompass the entire message, we could
do this sooner than later!
Thanks for your excellent plugin by the way!
Michael Grant
Re: using URIBL on other headers
Posted by "Kevin A. McGrail" <km...@apache.org>.
On 9/22/2018 5:55 PM, Michael Grant wrote:
> The URIBL plugin looks for URLs in the subject and message body.
>
> Is there some way to coax it to look in the other headers as well, for
> example the From: Reply-to: or the Received headers?
>
>
It's fractured. There are various lookups in various states in various
plugins.
From, Reply-to, Received, nameservers, rdns, webmail server headers,
etc. are all enhancements I want to add for RBL lookups. Some sort of
generic Header lookup would be best. I can't remember if I have a
bugzilla for this but I have a lot of private notes about it.
Regards,
KAM
--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171
Re: using URIBL on other headers
Posted by Rob McEwen <ro...@invaluement.com>.
On 9/22/2018 5:55 PM, Michael Grant wrote:
> The URIBL plugin looks for URLs in the subject and message body.
> Is there some way to coax it to look in the other headers as well, for
> example the From: Reply-to: or the Received headers?
Michael,
This reminds me of that saying, "just because you can, doesn't mean you
should" - and along those lines, I have some interesting observations
about this:
(1) some URI/domain blacklists are ONLY intended for blocking on the
domain or IP that is at the base of clickable links inside the body of
the message. These will often have a small (but critical) uptick in
false positives if used to check against domains found in the SMTP
envelope (FROM, PTR record, HELO), with typically a very small increase
in additional spams blocked. SO BE CAREFUL -AND- if you use a URI/domain
blacklist in that way and they don't prescribe that type of usage, don't
complain to them or anyone about any resulting false positives - because
it would then be your MIS-usage off their list that caused those false
positive.
(2) Even so, there really are SOME series of spams that can be safely
blocked based on domains that are in the SMTP envelope (FROM, PTR
record, HELO). In some cases, these are snowshoe spammers who are
sending from their own spammy domain - but where this domain is NOT
found in a clickable link inside the body of the message - they really
are trying to get the user to hit "reply". So there really is a purpose
for this, even if it is is a very small percentage of all spam
(3) However, even with that being a very small percentage of all - LARGE
mail hosters LOVE THIS IDEA? Why? Because it is SO EFFICIENT for them to
be able to block MORE spam based on information in the SMTP envelope -
BEFORE the "data" command. Sometimes, this helps block messages where
the domain was in a clickable link inside the body of the message - but
it is still MORE EFFICIENT to block that based on the domain also being
in the SMTP envelop.
(4) ABOUT THOSE FALSE POSITIVES: One of the main reasons that this is so
risky for False Positives... is because two things are epidemic in
recent years: (a) web site gets hijacked by criminal spammer, who
installed pages there that redirect to pornographic dating sites or pill
spam websites -AND/OR- (b) email account on the mail server gets
credentials hijacked and starts spewing spam. HERE IS THE PROBLEM:
*MOST* of the time, one or the other happens, (a or b) but not both.
Therefore, if (a) happens, they are sure to land on traditional URI
blacklists like SURBL, URIBL, and ivmURI. But this company - whose web
site was hacked - might not have a single spam coming from their mail
server. Yet, if you do the SMTP envelope checking against such URI
blacklists - you're going to have a substantially higher amount of false
positives due to blocking ALL of those emails that merely have a "FROM"
address ending in that domain name - even though NONE of THOSE messages
are spam.
(5) So which lists *DO* support blocking on the SMTP envelope? Spamhaus'
DBL list is designed for this. However, invaluement's ivmURI list is NOT
supposed to be used in this manner. SURBL and URIBL were originally
designed to not be used in this way - but that might have changed in
recent years? I recommend checking on that. In the meantime, I recommend
*ONLY* using Spamhaus' DBL list in this way. (possibly SURBL or URIBL
too? - but double check on that!)
(6) QUESTION: So why would a list not support both blocking methods? For
example, why wouldn't ivmURI support this method?
ANSWER: What Spamhaus did with DBL, while interesting, put them at a
strategic disadvantage, and there isn't a thing they can do about that
without making fundamental changes to their strategy. Recall that false
positive scenario mentioned earlier, where a hacked web site causing a
URI-list blacklisting can lead to substantially more false positives due
to only hitting on legit mails when blocking based on this domain being
in the SMTP envelope? Well.. the OPPOSITE situation ALSO causes more
false positives. When their email system has a hijacked email account,
but their web site was NOT hacked - then domain blacklists that
prescribe BOTH blocking methods and blacklist that domain... are going
to then start blocking ALL messages that have that domain as a hyperlink
inside the body of the message, even if THOSE messages are legit. This
will then cause a substantial number of false positives that were not
part of those hijacked outbound messages. So this works both ways. The
problem with such domain blacklists that prescribe both uses... is that
they either have to settle for (a) more false positives -OR- (b) more
false negatives. In other words, the higher collateral damage potential
means that there is going to be more collateral damage when they "take
the bait" and blacklist the domain -OR- their desire to limit false
positives will cause them to defer on the listing - even though it would
have been an excellent and justified ratio of spam-to-ham blocked, with
little collateral damage if the mail systems using that list could have
ONLY blocked using one method or the other, NOT both! DBL likely errs on
the side of less collateral damage - so it is should be safe to use DBL
for blocking based on both methods, as they prescribe, especially
considering Spamhaus' reputation for extremely low false positives.
Then, other URI lists can pick up the slack on the occasional False
Negatives.
(7) Given this information, at invaluement, we have solved this problem
by creating a new domain blacklist ("ivmSED") that is independent of
ivmURI, where ivmSED is a domain blacklist used ONLY for blocking based
on the domains found on the SMTP envelope (FROM, PTR record, HELO) - and
where ivmSED NOT be used for blocking domains in clickable links in the
body of the message, since that is the job of our ivmURI list. That way,
ivmSED and ivmURI are independent, and we then have the flexibility to
block a domain using either method independently, or both together, for
the approach that most surgically targets the spam, keeps collateral
damage to a minimum, and without compromises that lead to more false
negatives. ivmSED has just recently entering beta testing. (SED =
"Sender's Envelope Domain").
--
Rob McEwen
https://www.invaluement.com
Re: using URIBL on other headers
Posted by RW <rw...@googlemail.com>.
On Sat, 22 Sep 2018 22:55:49 +0100
Michael Grant wrote:
> The URIBL plugin looks for URLs in the subject and message body.
>
> Is there some way to coax it to look in the other headers as well, for
> example the From: Reply-to: or the Received headers?
You can create individual rules for "From:" like:
askdns AUTHOR_IN_URIBL_BLACK _AUTHORDOMAIN_.multi.uribl.com A 2
However since I wrote this I've had 25 hits compared with 940 for
URIBL_BLACK. Only 4 spams hit AUTHOR_IN_URIBL_BLACK without URIBL_BLACK.