You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Mike Marynowski <mi...@singulink.com> on 2019/02/27 17:16:20 UTC

Spam rule for HTTP/HTTPS request to sender's root domain

Hi everyone,

I haven't been able to find any existing spam rules or checks that do 
this, but from my analysis of ham/spam I'm getting I think this would be 
a really great addition. Almost all of the spam emails that are coming 
through do not have a working website at the room domain of the sender. 
Of the 100 last legitimate email domains that have sent me mail, 100% of 
them have working websites at the root domain.

As far as I can tell there isn't currently a way to build a rule that 
does this and a Perl plugin would have to be created. Is this an 
accurate assessment? Can you recommend some good resources for building 
a SpamAssassin plugin if this is the case?

Thanks!

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Sorry, I meant I thought it was doing those checks because I know I was 
playing with checking A records before and figured the rules would have 
it enabled by default...I tried to find the rules after I sent that 
message and realized that was related to sender domain A record checks 
done in my MTA.

On 3/1/2019 2:26 PM, Antony Stone wrote:
> On Friday 01 March 2019 at 17:37:18, Mike Marynowski wrote:
>
>> Quick sampling of 10 emails: 8 of them have valid A records on the email
>> domain. I presumed SpamAssassin was already doing simple checks like that.
> That doesn't sound like a good idea to me (presuming, I mean).
>
>
> Antony.
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Friday 01 March 2019 at 17:37:18, Mike Marynowski wrote:

> Quick sampling of 10 emails: 8 of them have valid A records on the email
> domain. I presumed SpamAssassin was already doing simple checks like that.

That doesn't sound like a good idea to me (presuming, I mean).


Antony.

-- 
"The future is already here.   It's just not evenly distributed yet."

 - William Gibson

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

On 3/1/2019 1:07 PM, RW wrote:
> Sure, but had it turned-out that most of these domains didn't have the A
> record necessary for your HTTP test, it wouldn't have been worth doing
> anything more complicated.

I've noticed a lot of the spam domains appear to point to actual web 
servers but throw 403 or 503 errors, which A records wouldn't help with 
and has been taken into account here. As for being "more complicated" - 
it's basically done and running in my test environment for final 
tweaking haha, so bit late now :P It was only a day's work to put 
everything together including the DNS service and caching layer, so meh. 
Unless you mean complicated in the sense that it's more technically 
complicated as opposed to effort wise.

> You don't need an A record for email. The last time I looked it just
> tests that there's enough DNS for a bounce to be received, so an A or
> MX for the sender domain.

I'm confusing different tests here, you can disregard my previous message.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Fri, 1 Mar 2019 11:37:18 -0500
Mike Marynowski wrote:

> Looking for an A record on what - just the email address domain or
> the chain of parent domains as well? If the latter, well a lack of A
> record will cause this to fail so it's kind of embedded in.

Sure, but had it turned-out that most of these domains didn't have the A
record necessary for your HTTP test, it wouldn't have been worth doing
anything more complicated. 

> Quick sampling of 10 emails: 8 of them have valid A records on the
> email domain. I presumed SpamAssassin was already doing simple checks
> like that.

You don't need an A record for email. The last time I looked it just
tests that there's enough DNS for a bounce to be received, so an A or
MX for the sender domain.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Looking for an A record on what - just the email address domain or the 
chain of parent domains as well? If the latter, well a lack of A record 
will cause this to fail so it's kind of embedded in.

Quick sampling of 10 emails: 8 of them have valid A records on the email 
domain. I presumed SpamAssassin was already doing simple checks like that.

On 3/1/2019 10:23 AM, RW wrote:
> On Wed, 27 Feb 2019 12:16:20 -0500
> Mike Marynowski wrote:
>> Almost all of the spam emails that are
>> coming through do not have a working website at the room domain of
>> the sender.
> Did you establish what fraction of this spam could be caught just by
> looking for an A record?

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Wed, 27 Feb 2019 12:16:20 -0500
Mike Marynowski wrote:
> Almost all of the spam emails that are
> coming through do not have a working website at the room domain of
> the sender. 

Did you establish what fraction of this spam could be caught just by
looking for an A record?

Re: Open source

Posted by Ralph Seichter <ab...@monksofcool.net>.

* RW:

> You're missing the point.

It may surprise you, but there is more than one "point" to having
packages, and I can choose to make whatever point I damn well
please. :-)

-Ralph

Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

Posted by RW <rw...@googlemail.com>.

On Thu, 21 Mar 2019 18:26:15 +0100
Ralph Seichter wrote:

> * Mike Marynowski:
> 
> > I was more asking if there is a good reason to build packages
> > intended for local installation by email server operators and I
> > don't think there really is.  
> 
> As a maintainer of several Gentoo Linux ebuilds, I agree you should
> leave packaging to the various Linux distributions. Building, testing,
> dependency management etc. vary significantly. Best leave that to the
> folks who do it on a regular basis.

You're missing the point. The reason for not having packages, is that
it's more accurate if everyone shares the same database.

Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> I was more asking if there is a good reason to build packages intended
> for local installation by email server operators and I don't think
> there really is.

As a maintainer of several Gentoo Linux ebuilds, I agree you should
leave packaging to the various Linux distributions. Building, testing,
dependency management etc. vary significantly. Best leave that to the
folks who do it on a regular basis.

-Ralph

Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

Posted by Mike Marynowski <mi...@singulink.com>.

Perhaps I should have been clearer - I'm not against posting the code 
for any reason and I am planning to do that anyway in case anyone wants 
to look at it or chip in improvements and whatnot.

I'm an active contributor on many open source projects and I have fully 
embraces OSS :) I was more asking if there is a good reason to build 
packages intended for local installation by email server operators and I 
don't think there really is. There's a fundamental difference in how the 
project would be setup if it was intended to be installed by all email 
server operators, i.e. writing a config file loader instead of 
hardcoding values, allowing more flexibility, building packages for 
different operating systems, etc. What I'm saying is I don't think I 
will be officially supporting that route as it seems more beneficial to 
collaborate on a central database, though people are obviously free to 
do with the code as they wish.

Cheers!

Mike

On 3/21/2019 5:42 AM, Tom Hendrikx wrote:
> On 20-03-19 19:56, Mike Marynowski wrote:
>> A couple people asked about me posting the code/service so they could
>> run it on their own systems but I'm currently leaning away from that. I
>> don't think there is any benefit to doing that instead of just utilizing
>> the centralized service. The whole thing works better if everyone using
>> it queries a central service and helps avoid people easily making bad
>> mistakes like the one above and then spending hours scrambling to try to
>> find non-existent botnet infections on their network while mail bounces
>> because they are on a blocklisted :( If someone has a good reason for
>> making the service locally installable let me know though, haha.
> When people are interested in seeing the code, their main incentive for
> such a request is probably not that they want to run it themselves. They
> might, in no particular order:
>
> - would like to learn from what you're doing
> - would like to see how you're treating their contributed data
> - would like to verify the listing policy that you're proposing
> - would like to study if there could be better criteria for
> listing/unlisting than the ones currently available
> - change things to the software and contribute that back for the
> benefit of everyone
> - squash bugs that you're currently might be missing
> - help out on further development of the service if or when your time is
> limited
> - don't be depending on a single person to maintain a service they like
>
> This is called open source, and it's a good thing. For details on the
> philosophy behind it,
> http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is
> a good read.
>
> In short: if you like your project to prosper, put it on github for
> everyone to see.
>
> Kind regards,
>
> 	Tom
>

Re: Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

Posted by Mike Marynowski <mi...@singulink.com>.

Here ya go ;)

https://github.com/mikernet/HttpCheckDnsServer

On 3/21/2019 5:42 AM, Tom Hendrikx wrote:
> On 20-03-19 19:56, Mike Marynowski wrote:
>> A couple people asked about me posting the code/service so they could
>> run it on their own systems but I'm currently leaning away from that. I
>> don't think there is any benefit to doing that instead of just utilizing
>> the centralized service. The whole thing works better if everyone using
>> it queries a central service and helps avoid people easily making bad
>> mistakes like the one above and then spending hours scrambling to try to
>> find non-existent botnet infections on their network while mail bounces
>> because they are on a blocklisted :( If someone has a good reason for
>> making the service locally installable let me know though, haha.
> When people are interested in seeing the code, their main incentive for
> such a request is probably not that they want to run it themselves. They
> might, in no particular order:
>
> - would like to learn from what you're doing
> - would like to see how you're treating their contributed data
> - would like to verify the listing policy that you're proposing
> - would like to study if there could be better criteria for
> listing/unlisting than the ones currently available
> - change things to the software and contribute that back for the
> benefit of everyone
> - squash bugs that you're currently might be missing
> - help out on further development of the service if or when your time is
> limited
> - don't be depending on a single person to maintain a service they like
>
> This is called open source, and it's a good thing. For details on the
> philosophy behind it,
> http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is
> a good read.
>
> In short: if you like your project to prosper, put it on github for
> everyone to see.
>
> Kind regards,
>
> 	Tom
>

Open source (WAS: Spam rule for HTTP/HTTPS request to sender's root domain)

Posted by Tom Hendrikx <to...@whyscream.net>.

On 20-03-19 19:56, Mike Marynowski wrote:
> 
> A couple people asked about me posting the code/service so they could
> run it on their own systems but I'm currently leaning away from that. I
> don't think there is any benefit to doing that instead of just utilizing
> the centralized service. The whole thing works better if everyone using
> it queries a central service and helps avoid people easily making bad
> mistakes like the one above and then spending hours scrambling to try to
> find non-existent botnet infections on their network while mail bounces
> because they are on a blocklisted :( If someone has a good reason for
> making the service locally installable let me know though, haha.

When people are interested in seeing the code, their main incentive for
such a request is probably not that they want to run it themselves. They
might, in no particular order:

- would like to learn from what you're doing
- would like to see how you're treating their contributed data
- would like to verify the listing policy that you're proposing
- would like to study if there could be better criteria for
listing/unlisting than the ones currently available
- change things to the software and contribute that back for the
benefit of everyone
- squash bugs that you're currently might be missing
- help out on further development of the service if or when your time is
limited
- don't be depending on a single person to maintain a service they like

This is called open source, and it's a good thing. For details on the
philosophy behind it,
http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/ is
a good read.

In short: if you like your project to prosper, put it on github for
everyone to see.

Kind regards,

	Tom

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Continuing to fine-tune this service - thank you to everyone testing it. 
Some updates were pushed out yesterday:

  * Initial new domain "grace period" reduced to 8 minutes (down from 15 
mins) - 4 attempts are made within this time to get a valid HTTP response
  * Mozilla browser spoofing is implemented to avoid problems with 
websites that block HttpClient requests
  * Fixes to NXDOMAIN negative result caching appear to be working well now

Some lessons learned in the meantime as well. Turns out that letting the 
HTTP test run though an email server IP is a terrible idea as it will 
put the IP on some blocklists for attempting to make HTTP connections to 
botnet command & control honeypot servers if someone happens to query 
one of those domains, LOL.

A couple people asked about me posting the code/service so they could 
run it on their own systems but I'm currently leaning away from that. I 
don't think there is any benefit to doing that instead of just utilizing 
the centralized service. The whole thing works better if everyone using 
it queries a central service and helps avoid people easily making bad 
mistakes like the one above and then spending hours scrambling to try to 
find non-existent botnet infections on their network while mail bounces 
because they are on a blocklisted :( If someone has a good reason for 
making the service locally installable let me know though, haha.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Jari Fredriksson <ja...@iki.fi>.


> Antony Stone <An...@spamassassin.open.source.it> kirjoitti 13.3.2019 kello 20.36:
> 
> On Wednesday 13 March 2019 at 19:21:47, Jari Fredriksson wrote:
> 
>> What would it result for this:
>> 
>> I have a couple domains that do not have any services for the root domain
>> name. How ever, the server the A points do have a web server that acts as
>> a reverse proxy for many subdomains that will be served a web page. A http
>> 503 is returned by the pound reverse for the root domains.
> 
> What is a "pound reverse"?
> 
> Antony.
> 
>> gladiator:~ jarif$ curl -v http://bitwell.biz
>> * Rebuilt URL to: http://bitwell.biz/
>> *   Trying 138.201.119.25...
>> * TCP_NODELAY set
>> * Connected to bitwell.biz (138.201.119.25) port 80 (#0)
>> 
>>> GET / HTTP/1.1
>>> Host: bitwell.biz
>>> User-Agent: curl/7.54.0
>>> Accept: */*
>> 
>> * HTTP 1.0, assume close after body
>> < HTTP/1.0 503 Service Unavailable
>> < Content-Type: text/html
>> < Content-Length: 53
>> < Expires: now
>> < Pragma: no-cache
>> < Cache-control: no-cache,no-store
>> <
>> * Closing connection 0
>> 
>> Br. Jarif
> 

Pound reverse proxy.I forgot that ”proxy” in that. Pound is a simple but effective reverse proxy software package (FOSS) for http(s).

Br. Jarif

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Wednesday 13 March 2019 at 19:21:47, Jari Fredriksson wrote:

> What would it result for this:
> 
> I have a couple domains that do not have any services for the root domain
> name. How ever, the server the A points do have a web server that acts as
> a reverse proxy for many subdomains that will be served a web page. A http
> 503 is returned by the pound reverse for the root domains.

What is a "pound reverse"?

Antony.

> gladiator:~ jarif$ curl -v http://bitwell.biz
> * Rebuilt URL to: http://bitwell.biz/
> *   Trying 138.201.119.25...
> * TCP_NODELAY set
> * Connected to bitwell.biz (138.201.119.25) port 80 (#0)
> 
> > GET / HTTP/1.1
> > Host: bitwell.biz
> > User-Agent: curl/7.54.0
> > Accept: */*
> 
> * HTTP 1.0, assume close after body
> < HTTP/1.0 503 Service Unavailable
> < Content-Type: text/html
> < Content-Length: 53
> < Expires: now
> < Pragma: no-cache
> < Cache-control: no-cache,no-store
> <
> * Closing connection 0
> 
> Br. Jarif

-- 
Numerous psychological studies over the years have demonstrated that the 
majority of people genuinely believe they are not like the majority of people.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Any HTTP status code 400 or higher is treated as no valid website on the 
domain. I see a considerable amount of spam that returns 5xx codes so at 
this point I don't plan on changing that behavior. 503 is supposed to 
indicate a temporary condition so this seems like an abuse of the error 
code.

On 3/13/2019 2:21 PM, Jari Fredriksson wrote:
> What would it result for this:
>
> I have a couple domains that do not have any services for the root domain name. How ever, the server the A points do have a web server that acts as a reverse proxy for many subdomains that will be served a web page. A http 503 is returned by the pound reverse for the root domains.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Jari Fredriksson <ja...@iki.fi>.

What would it result for this:

I have a couple domains that do not have any services for the root domain name. How ever, the server the A points do have a web server that acts as a reverse proxy for many subdomains that will be served a web page. A http 503 is returned by the pound reverse for the root domains.

gladiator:~ jarif$ curl -v http://bitwell.biz
* Rebuilt URL to: http://bitwell.biz/
*   Trying 138.201.119.25...
* TCP_NODELAY set
* Connected to bitwell.biz (138.201.119.25) port 80 (#0)
> GET / HTTP/1.1
> Host: bitwell.biz
> User-Agent: curl/7.54.0
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 503 Service Unavailable
< Content-Type: text/html
< Content-Length: 53
< Expires: now
< Pragma: no-cache
< Cache-control: no-cache,no-store
< 
* Closing connection 0

Br. Jarif

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Dominic Raferd <do...@timedicer.co.uk>.

On Wed, 13 Mar 2019 at 13:04, RW <rw...@googlemail.com> wrote:
>
> On Wed, 13 Mar 2019 10:53:06 +0000
> Dominic Raferd wrote:
>
> > On Wed, 13 Mar 2019 at 10:33, Mike Marynowski <mi...@singulink.com>
> > wrote:
> > >
> >
> > For those of us who are not SA experts can you give an example of how
> > to use your helpful new lookup facility (i.e. lines to add in
> > local.cf)? Thanks
>
>
> askdns AUTHOR_IN_HTTPCHECK  _AUTHORDOMAIN_.httpcheck.singulink.com A 1
>
> score  AUTHOR_IN_HTTPCHECK   0.1 # adjust as appropriate
>
> This assumes that Mail::SpamAssassin::Plugin::AskDNS is loaded, which
> it is by default.

Thanks, giving it a go...

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Wed, 13 Mar 2019 10:53:06 +0000
Dominic Raferd wrote:

> On Wed, 13 Mar 2019 at 10:33, Mike Marynowski <mi...@singulink.com>
> wrote:
> >  
> 
> For those of us who are not SA experts can you give an example of how
> to use your helpful new lookup facility (i.e. lines to add in
> local.cf)? Thanks


askdns AUTHOR_IN_HTTPCHECK  _AUTHORDOMAIN_.httpcheck.singulink.com A 1

score  AUTHOR_IN_HTTPCHECK   0.1 # adjust as appropriate 

This assumes that Mail::SpamAssassin::Plugin::AskDNS is loaded, which
it is by default.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Dominic Raferd <do...@timedicer.co.uk>.

On Wed, 13 Mar 2019 at 10:33, Mike Marynowski <mi...@singulink.com> wrote:
>

For those of us who are not SA experts can you give an example of how
to use your helpful new lookup facility (i.e. lines to add in
local.cf)? Thanks

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Back up after some extensive modifications.

Setting the DNS request timeout to 30 seconds is no longer necessary - 
the service instantly responds to queries.

In order to prevent mail delivery issues if the website is having 
technical issues the first time a domain is seen by the service, it will 
instantly return a response that it is a valid domain (NXDOMAIN) with a 
15 minute TTL. It will then queue up testing of this domain in the 
background and automatically keep retrying every few minutes if HTTP 
contact fails. After 15 minutes of failed HTTP contact, the DNS service 
will begin responding with an invalid domain response (127.0.0.1), 
exponentially increasing TTLs and time between background checks until 
it reaches about 17 hours between checks. The service automatically run 
checks in the background for all domains queried within the last 30 days 
and instantly responds to DNS queries with the cached result. If a web 
server goes down, has technical issues, etc...it will still be reported 
as a valid domain for approximately 4 days after the last successful 
HTTP contact while being continually being checked in the background, so 
temporary issues won't affect mail delivery.

On 3/11/2019 7:18 PM, RW wrote:
> It doesn't seem to be working. Is it gone?
>
>
>
> $ dig +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com
>
> ; <<>> DiG 9.11.0-P5 <<>> +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 57443
> ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> ...

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Fri, 1 Mar 2019 01:21:40 -0500
Mike Marynowski wrote:

> For anyone who wants to play around with this, the DNS service has
> been posted. You can test the existence of a website on a domain or
> any of its parent domains by making DNS queries as follows:
> 
> subdomain.domain.com.httpcheck.singulink.com


It doesn't seem to be working. Is it gone?



$ dig +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com

; <<>> DiG 9.11.0-P5 <<>> +norecurse @ns1.singulink.com hwvyuprmjpdrws.com.httpcheck.singulink.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: FORMERR, id: 57443
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
...

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Andrea Venturoli <ml...@netfence.it>.

On 2019-03-01 07:21, Mike Marynowski wrote:
> For anyone who wants to play around with this, the DNS service has been 
> posted. You can test the existence of a website on a domain or any of 
> its parent domains by making DNS queries as follows:
> 
> subdomain.domain.com.httpcheck.singulink.com

Hello.
I was getting around to test this, but I can't seem to reach the service.
Is it still active?

  bye & Thanks
	av.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

For anyone who wants to play around with this, the DNS service has been 
posted. You can test the existence of a website on a domain or any of 
its parent domains by making DNS queries as follows:

subdomain.domain.com.httpcheck.singulink.com

So, if you wanted to check if mail1.mx.google.com or any of its parent 
domains have a website, you would do a DNS query with a 30 second 
timeout for:

mail1.mx.google.com.httpcheck.singulink.com

This will check the following domains for a valid HTTP response within 
15 seconds:

mail1.mx.google.com
mx.google.com
google.com

If a valid HTTP response comes back then the DNS query will return 
NXDOMAIN with a 7 day TTL. If no valid HTTP response comes back then the 
DNS query will return 127.0.0.1 with progressively increasing TTLs:

#1: 2 mins
#2: 4 mins
#3: 6 mins
#4: 8 mins
#5: 10 mins
#6: 20 mins
#7: 30 mins
#8: 40 mins
#9: 50 mins
#10: 1 hour
#11: 2 hours
#12+: add 2 hours extra for each attempt up to 24h max

As long as an invalid domain has been queried in the last 7 days, it 
will remain cached and any further invalid attempts will continue to 
progressively increase the TTL according to the rules above. If a domain 
doesn't get queried for 7 days then it drops out of the cache and its 
invalid attempt counter is reset. A valid HTTP response will reset the 
domains invalid counter and a 7 day TTL is returned. Once a domain is in 
the cache, responses are immediate until the TTL runs out and the domain 
is rechecked again.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Benny Pedersen <me...@junc.eu>.

Ralph Seichter skrev den 2019-02-28 18:53:

> By the way, are you aware of https://www.dnswl.org ?

https://www.mywot.com
https://www.trustpilot.com

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* David Jones:

> I would like to see an Open Mail Reputation System setup by a working
> group of big companies so it would have some weight behind it.

Running a smaller business, I have no interest whatsoever in a "group of
big companies" having any say in our mail reputation, as you can surely
understand. All our commercial email passes DKIM, SPF and DMARC tests
anyway.

By the way, are you aware of https://www.dnswl.org ?

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by David Jones <dj...@ena.com>.

On 2/28/19 10:50 AM, Ralph Seichter wrote:
> * Mike Marynowski:
> 
>> And the cat and mouse game continues :)
> 
> It sure does, and that's what sticks in my craw here: For a pro spammer,
> it is easy to set up websites in an automated fashion. If I was such a
> naughty person, I'd just add one tiny service that answers "all is well"
> for every incoming HTTP request.
> 
> Why even use a test for something that is so easily compromised?
> 
> -Ralph
> 

I would like to see an Open Mail Reputation System setup by a working 
group of big companies so it would have some weight behind it.  Setup 
some sort of scale like 0 to 100 for reputation that starts established 
domains older than X days at 50 (in the middle) and then have a clearing 
house for spam reports where it takes several different reports from 
different sources to lower a domain's score.  I am sure some smart 
Google engineers or SpamCop.net could do this their spare time in a way 
that can't be abused or poisoned.

Newly registered domains that are less than X days old would start at 
zero or 25 and have to earn their increase in score over time.  Maybe 
every week without a report of spam the score goes up by some increment.

Domains would have to implement good SPF, DKIM, and DMARC to participate 
in this reputation system.  A postmaster address (maybe the DMARC 
reporting email address) would be required with a mail loop verification.

Bounce messages would have a clear/plain message with a link explaining 
why the message was bounced (because of a sender problem, not the 
recipient mail server problem).  Default to opt-in sending copies of the 
bounce message to the postmaster address and require mail admins to 
opt-out if they don't want it.  (A major problem in email support today 
is not having good contacts of admins on the other end.  End users don't 
know what to do with bounce messages and mail admins can't easily get 
together to work on delivery problems.)

-- 
David Jones

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> You know what I mean.

That's quite an assumption to make, in a mailing list. ;-)

> I could just not publish this and keep it for myself and I'm sure that
> would make it more effective long term for me, but I figured I would
> contribute it so that others can gain some benefit from it.

Sounds reasonable to me, as long as such a plugin is not activated as a
SpamAssassin default, and defaults to a low score when activated.

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by John Schmerold <sc...@gmail.com>.

Mike: If you want a tester, I am happy to join the effort, I see little 
harm in assigning 0.75 to the results.

There are quite a few email only domains we end up whitelist_auth'ing 
them and all is well.

John Schmerold
Katy Computer Systems, Inc
https://katycomputer.com
St Louis

On 2/28/2019 11:19 AM, Mike Marynowski wrote:
> You know what I mean. *Many (not all) of the rules (rDNS verification, 
> hostname check, SPF records, etc) are easy to circumvent but we still 
> check all that. Those simple checks still manage to catch a surprising 
> amount of spam.
>
> I could just not publish this and keep it for myself and I'm sure that 
> would make it more effective long term for me, but I figured I would 
> contribute it so that others can gain some benefit from it.
>
> If it doesn't become widespread and SpamAssassin isn't interested in 
> embedding it directly into their rule checks then that's fine by me, 
> I'm not going to cry about it...more spam catching for me and whoever 
> decides to install the plugin on their own servers. If it does become 
> widespread and some spammers adapt then I'll take solace in knowing I 
> helped a lot of people stop at least some of their spam.
>> * Mike Marynowski:
>>
>>> Everything we test for is easily compromised on its own.
>> That's quite a sweeping statement, and I disagree. IP-based real time
>> blacklists, anyone? Also, "we" is too unspecific. In addition to the
>> stock rules, I happen to maintain a set of custom tests which are
>> neither published nor easily circumvented. They have proven pretty
>> effective for us.
>>
>> -Ralph
>
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

You know what I mean. *Many (not all) of the rules (rDNS verification, 
hostname check, SPF records, etc) are easy to circumvent but we still 
check all that. Those simple checks still manage to catch a surprising 
amount of spam.

I could just not publish this and keep it for myself and I'm sure that 
would make it more effective long term for me, but I figured I would 
contribute it so that others can gain some benefit from it.

If it doesn't become widespread and SpamAssassin isn't interested in 
embedding it directly into their rule checks then that's fine by me, I'm 
not going to cry about it...more spam catching for me and whoever 
decides to install the plugin on their own servers. If it does become 
widespread and some spammers adapt then I'll take solace in knowing I 
helped a lot of people stop at least some of their spam.
> * Mike Marynowski:
>
>> Everything we test for is easily compromised on its own.
> That's quite a sweeping statement, and I disagree. IP-based real time
> blacklists, anyone? Also, "we" is too unspecific. In addition to the
> stock rules, I happen to maintain a set of custom tests which are
> neither published nor easily circumvented. They have proven pretty
> effective for us.
>
> -Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> Everything we test for is easily compromised on its own.

That's quite a sweeping statement, and I disagree. IP-based real time
blacklists, anyone? Also, "we" is too unspecific. In addition to the
stock rules, I happen to maintain a set of custom tests which are
neither published nor easily circumvented. They have proven pretty
effective for us.

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

> Why even use a test for something that is so easily compromised?
> -Ralph

Everything we test for is easily compromised on its own.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> And the cat and mouse game continues :)

It sure does, and that's what sticks in my craw here: For a pro spammer,
it is easy to set up websites in an automated fashion. If I was such a
naughty person, I'd just add one tiny service that answers "all is well"
for every incoming HTTP request.

Why even use a test for something that is so easily compromised?

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

And the cat and mouse game continues :)

That said, all the big obvious "email-only domains" that send out 
newsletters and notifications and such that I've come across in my 
sampling already have placeholder websites or redirects to their main 
websites configured. I'm sure that's not always the case but the data I 
have indicates that's the exception and not the rule.

On 2/28/2019 11:37 AM, Ralph Seichter wrote:
> * Antony Stone:
>
>> Each to their own.
> Of course. Alas, if this gets widely adopted, we'll probably have to set
> up placeholder websites (as will spammers, I'm sure).
>
> -Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Antony Stone:

> Each to their own.

Of course. Alas, if this gets widely adopted, we'll probably have to set
up placeholder websites (as will spammers, I'm sure).

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Thursday 28 February 2019 at 17:14:04, Ralph Seichter wrote:

> * Grant Taylor:
> > Why would you do it per email? I would think that you would do the
> > test and cache the results for some amount of time.
> 
> I would not do it at all, caching or no caching. Personally, I don't see
> a benefit trying to correlate email with a website, as mentioned before,
> based on how we utilise email-only-domains.

Each to their own.

If a mail admin finds a good correlation between no-website and spam, it's a 
good check to add into the mix

Nothing should be a poison pill in itself, and if you use email-only domains, 
you (they) still won't get blocked provided the emails they send don't 
otherwise look spammy.

Mike has already said:

On Thursday 28 February 2019 at 15:25:39, Mike Marynowski wrote:

> as a 100% ban rule this is obviously a bad idea. As a score modifier I think
> it would be highly effective.
> 
> I found several "email only" domains in my sampling but all the big ones
> still had landing pages at the root domain saying "this domain is only
> used for serving email" or similar. I'm sure there are exceptions and
> some people will have email only domains, but that's why we don't put
> 100% confidence into any one rule.

Personally I'm very interested in such a rule and its real-world effectiveness.


Antony.

-- 
Tinned food was developed for the British Navy in 1813.

The tin opener was not invented until 1858.

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

I've tested this with good results and I'm actually not creating any 
HTTPS connections - what I've found is a single HTTP request with zero 
redirections is enough. If it returns a status code >= 400 then you 
treat it like no valid website, and if you get a < 400 result (i.e. a 
301/302 redirect or a 200 ok) then you can treat it like a valid 
website. You don't even need to receive the body of the HTTP result, you 
can quit after seeing the status.

And yes, as a 100% ban rule this is obviously a bad idea. As a score 
modifier I think it would be highly effective.

I found several "email only" domains in my sampling but all the big ones 
still had landing pages at the root domain saying "this domain is only 
used for serving email" or similar. I'm sure there are exceptions and 
some people will have email only domains, but that's why we don't put 
100% confidence into any one rule.

On 2/27/2019 7:57 PM, Grant Taylor wrote:
> On 02/27/2019 03:25 PM, Ralph Seichter wrote:
>> We use some of our domains specifically for email, with no associated 
>> website.
>
> I agree that /requiring/ a website at one of the parent domains 
> (stopping before traversing into the Public Suffix List) is 
> problematic and prone to false positives.
>
> There /may/ be some value to /some/ people in doing such a check and 
> altering the spam score.  (See below.)
>
>> Besides, I think the overhead to establish a HTTPS connection for 
>> every incoming email would be prohibitive.
>
> Why would you do it per email?  I would think that you would do the 
> test and cache the results for some amount of time.
>
>> There is a reason most whitelist/blacklist services use "cheap" DNS 
>> queries instead.
> I wonder if there is a way to hack DNS into doing this for us. I.e. a 
> custom DNS ""server (BIND's DLZ comes to mind) that can perform the 
> test(s) and fabricate an answer that could then be cached.  ""Publish 
> these answers in a new zone / domain name, and treat it like another RBL.
>
> Meaning a query goes to the new RBL server, which does the necessary 
> $MAGIC to return an answer (possibly NXDOMAIN if there is a site and 
> 127.0.0.1 if there is no site) which can be cached by standard local / 
> recursive DNS servers.
>
>
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

> I would not do it at all, caching or no caching. Personally, I don't see
> a benefit trying to correlate email with a website, as mentioned before,
> based on how we utilise email-only-domains.
>
> -Ralph

Fair enough. Based on the sampling I've done and the way I intend to use 
this, I still see this as a net benefit. If you're running an email-only 
domain then you're probably doing some pretty email intensive stuff and 
you should be well-configured enough to the point where a nudge in the 
score shouldn't put you over the spam threshold. If you're a spammer 
just trying to make quick use of a domain and the spam score is already 
quite high but not quite over then this can tip the score over into 
marking it as spam.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Thursday 28 February 2019 at 20:25:36, Bill Cole wrote:

> On 28 Feb 2019, at 13:43, Mike Marynowski wrote:
> > On 2/28/2019 12:41 PM, Bill Cole wrote:
> >> You should probably put the envelope sender (i.e. the SA
> >> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That
> >> will make many messages sent via discussion mailing lists (such as
> >> this one) pass your test where a test of real header domains would
> >> fail, while it it is more likely to cause commercial bulk mail to
> >> fail where it would usually pass based on real standard headers.
> >> (That's based on a hunch, not testing.)
> > 
> > Can you clarify why you think my currently proposed headers would fail
> > with the mailing list? As far as I can tell, all the messages I've
> > received from this mailing list would pass just fine. As an example
> > from the emails in this list, which header value specifically would
> > cause it to fail?
> 
> If I did not explicitly set the Reply-To header, this message would be
> delivered without one. The domain part of the From header on messages I
> post to this and other mailing lists has no website and never will.

The same applies to my messages as well.  I use a list-specific "subdomain" on 
all my various list subscription addresses, however unlike Bill, I never set a 
Reply-To address, because I expect all list replies to go to the list (which I 
then receive as a subscriber).

Any emails which are sent to my list-subscription addresses directly (ie: not 
via the mailing list server, which adds its own identifiable headers) are 
discarded.


Regards,


Antony.

-- 
It may not seem obvious, but (6 x 5 + 5) x 5 - 55 equals 5!

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 2/28/19 1:24 PM, Luis E. Muñoz wrote:
> I suggest you look at the Mozilla Public Suffix List at 
> https://publicsuffix.org/ — it was created for different purposes, but I 
> believe it maps well enough to my understanding of your use case. You'll 
> be able to pad the gaps using a custom list.

+1 for Mozilla's PSL.

Also, remember to stop at the domain before the PS(L).  (Another message 
mentioned co.uk or something like that.  That's a PS and shouldn't be 
checked.)

-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

I'm pretty sure the way I ended up implementing it everything is working 
fine and it's nice and simple and clean but maybe there's some edge case 
that doesn't work properly. If there is I haven't found it yet, so if 
you can think of one let me know.

Since I'm sending an HTTP request to all subdomains simultaneously it 
doesn't really matter if I go one further than the actual root domain. A 
"co.uk" request will come back with no website so there's no need to 
special handle it. For example, if the email address being tested is 
bob@mail1.mx.stuff.co.uk, an HTTP request goes out to:

mail1.mx.stuff.co.uk
mx.stuff.co.uk
stuff.co.uk
co.uk

The last one will always be cached from a previous .co.uk address lookup 
so it won't actually be sent out anyway. If any of them respond with a 
valid website then an OK result is returned.

On 2/28/2019 3:24 PM, Luis E. Muñoz wrote:
> This is more complicated than it seems. I have the t-shirt to prove it.
>
> I suggest you look at the Mozilla Public Suffix List at 
> https://publicsuffix.org/ — it was created for different purposes, but 
> I believe it maps well enough to my understanding of your use case. 
> You'll be able to pad the gaps using a custom list.
>
> Best regards
>
> -lem

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by "Luis E. Muñoz" <sa...@lem.click>.

On 28 Feb 2019, at 11:53, Mike Marynowski wrote:

> There are many ways to determine what the root domain is. One way is 
> analyzing the DNS response from the query to realize it's actually a 
> root domain, or you can just grab the ICANN TLD list and use that to 
> make a determination.
>
> What I'm probably going to do now that I'm building this as a cached 
> DNS service is just walk up the subdomains until I hit the root domain 
> and if any of them have a website then it's fine.

This is more complicated than it seems. I have the t-shirt to prove it.

I suggest you look at the Mozilla Public Suffix List at 
https://publicsuffix.org/ — it was created for different purposes, 
but I believe it maps well enough to my understanding of your use case. 
You'll be able to pad the gaps using a custom list.

Best regards

-lem

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

There are many ways to determine what the root domain is. One way is 
analyzing the DNS response from the query to realize it's actually a 
root domain, or you can just grab the ICANN TLD list and use that to 
make a determination.

What I'm probably going to do now that I'm building this as a cached DNS 
service is just walk up the subdomains until I hit the root domain and 
if any of them have a website then it's fine.

On 2/28/2019 2:39 PM, Antony Stone wrote:
> On Thursday 28 February 2019 at 20:33:42, Mike Marynowski wrote:
>
>> But scconsult.com does in fact have a website so I'm not sure what you
>> mean. This method checks the *root* domain, not the subdomain.
> How do you identify the root domain, given an email address?
>
> For example, for many years in the UK, it was possible to get something.co.uk
> or something.org.uk (and maybe something.net.uk), but now it is also possible
> to get something.uk
>
> So, I'm just wondering how you determine what the "root" domain for a given
> email address is.
>
>
> Antony.
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 14:39, Antony Stone wrote:

> On Thursday 28 February 2019 at 20:33:42, Mike Marynowski wrote:
>
>> But scconsult.com does in fact have a website so I'm not sure what you
>> mean. This method checks the *root* domain, not the subdomain.
>
> How do you identify the root domain, given an email address?

Mail::SpamAssassin::RegistryBoundaries, of course! :)

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Antony Stone <An...@spamassassin.open.source.it>.

On Thursday 28 February 2019 at 20:33:42, Mike Marynowski wrote:

> But scconsult.com does in fact have a website so I'm not sure what you
> mean. This method checks the *root* domain, not the subdomain.

How do you identify the root domain, given an email address?

For example, for many years in the UK, it was possible to get something.co.uk 
or something.org.uk (and maybe something.net.uk), but now it is also possible 
to get something.uk

So, I'm just wondering how you determine what the "root" domain for a given 
email address is.


Antony.

-- 
"It is easy to be blinded to the essential uselessness of them by the sense of 
achievement you get from getting them to work at all. In other words - and 
this is the rock solid principle on which the whole of the Corporation's 
Galaxy-wide success is founded - their fundamental design flaws are completely 
hidden by their superficial design flaws."

 - Douglas Noel Adams

                                                   Please reply to the list;
                                                         please *don't* CC me.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Rupert Gallagher <ru...@protonmail.com>.

On Fri, Mar 1, 2019 at 23:14, Mike Marynowski <mi...@singulink.com> wrote:

>> Does SpamAssassin even have facilities to do that?

> Yes, if spf runs at priority 1, you can define your test at priority 2, so SA executes them in the given order.

>> Don't all rules run all the time?

> They run when relevant, in the given order, and they do whay they say, so if you say that webtest stops if spf test succeeds, then SA does it.

>> SpamAssassin still needs to run all the rules because MTAs might have different spam mark / spam delete /etc thresholds than the one set in SA.
>
>> The number of cycles you're talking about is the same as an RBL lookup so I really don't see it as being significant. The DNS service does all the heavy lifting and I'm planning to make it public.
>
> It is significant of you have many emails to process. It is even more significant if you run the test locally.
>
> On 3/1/2019 5:09 PM, Rupert Gallagher wrote:
>
>> Case study:
>>
>> example.com bans any e-mail sent from its third levels up, and does it by spf.
>>
>> spf-banned.example.com sent mail, and my SA at server.com adds a big fat penalty, high enough to bounch it.
>>
>> Suppose I do not bounch it, and use your filter to check for its websites. It turns out that both example.com and spf-banned.example.com have a website. Was it worth it to spend cycles on it? I guess not. The spf is an accepted rfc and it should have priority. So, I recommend the website test to first read the result of the SPF test, quit when positive, continue otherwise.
>>
>> --- ruga
>
> On 3/1/2019 5:09 PM, Rupert Gallagher wrote:
>
>> Case study:
>>
>> example.com bans any e-mail sent from its third levels up, and does it by spf.
>>
>> spf-banned.example.com sent mail, and my SA at server.com adds a big fat penalty, high enough to bounch it.
>>
>> Suppose I do not bounch it, and use your filter to check for its websites. It turns out that both example.com and spf-banned.example.com have a website. Was it worth it to spend cycles on it? I guess not. The spf is an accepted rfc and it should have priority. So, I recommend the website test to first read the result of the SPF test, quit when positive, continue otherwise.
>>
>> --- ruga
>>
>> On Fri, Mar 1, 2019 at 22:31, Grant Taylor <gt...@tnetconsulting.net> wrote:
>>
>>> On 02/28/2019 09:39 PM, Mike Marynowski wrote:
>>>> I modified it so it checks the root domain and all subdomains up to the
>>>> email domain.
>>>
>>> :-)
>>>
>>>> As for your question - if afraid.org has a website then you are correct,
>>>> all subdomains of afraid.org will not flag this rule, but if lots of
>>>> afraid.org subdomains are sending spam then I imagine other spam
>>>> detection methods will have a good chance of catching it.
>>>
>>> ACK
>>>
>>> afraid.org is much like DynDNS in that one entity (afaid.org themselves
>>> or DynDNS) provide DNS services for other entities.
>>>
>>> I don't see a good way to differentiate between the sets of entities.
>>>
>>>> I'm not sure what you mean by "working up the tree" - if afraid.org has
>>>> a website and I work my way up the tree then either way eventually I'll
>>>> hit afraid.org and get a valid website, no?
>>>
>>> True.
>>>
>>> I wonder if there is any value in detecting zone boundaries via not
>>> going any higher up the tree past the zone that's containing the email
>>> domain(s).
>>>
>>> Perhaps something like that would enable differentiation between Afraid
>>> & DynDNS and the entities that they are hosting DNS services for.
>>> (Assuming that there are separate zones.
>>>
>>>> My current implementation fires off concurrent HTTP requests to the root
>>>> domain and all subdomains up to the email domain and waits for a valid
>>>> answer from any of them.
>>>
>>> ACK
>>>
>>> s/up to/down to/
>>>
>>> I don't grok the value of doing this as well as you do. But I think
>>> your use case is enough different than mine such that I can't make an
>>> objective value estimate.
>>>
>>> That being said, I do find the idea technically interesting, even if I
>>> think I'll not utilize it.
>>>
>>> --
>>> Grant. . . .
>>> unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Fri, 01 Mar 2019 22:09:01 +0000
Rupert Gallagher wrote:

> Case study:
> 
> example.com bans any e-mail sent from its third levels up, and does
> it by spf.
> 
> spf-banned.example.com sent mail, and my SA at server.com adds a big
> fat penalty, high enough to bounch it.


example.com has a TXT record of "v=spf1 -all" 

spf-banned.example.com has no TXT record at all

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Does SpamAssassin even have facilities to do that? Don't all rules run 
all the time? SpamAssassin still needs to run all the rules because MTAs 
might have different spam mark / spam delete /etc thresholds than the 
one set in SA.

The number of cycles you're talking about is the same as an RBL lookup 
so I really don't see it as being significant. The DNS service does all 
the heavy lifting and I'm planning to make it public.

On 3/1/2019 5:09 PM, Rupert Gallagher wrote:
> Case study:
>
> example.com bans any e-mail sent from its third levels up, and does it 
> by spf.
>
> spf-banned.example.com sent mail, and my SA at server.com adds a big 
> fat penalty, high enough to bounch it.
>
> Suppose I do not bounch it, and use your filter to check for its 
> websites. It turns out that both example.com and 
> spf-banned.example.com have a website. Was it worth it to spend cycles 
> on it? I guess not. The spf is an accepted rfc and it should have 
> priority. So, I recommend the website test to first read the result of 
> the SPF test, quit when positive, continue otherwise.
>
> --- ruga

On 3/1/2019 5:09 PM, Rupert Gallagher wrote:
> Case study:
>
> example.com bans any e-mail sent from its third levels up, and does it 
> by spf.
>
> spf-banned.example.com sent mail, and my SA at server.com adds a big 
> fat penalty, high enough to bounch it.
>
> Suppose I do not bounch it, and use your filter to check for its 
> websites. It turns out that both example.com and 
> spf-banned.example.com have a website. Was it worth it to spend cycles 
> on it? I guess not. The spf is an accepted rfc and it should have 
> priority. So, I recommend the website test to first read the result of 
> the SPF test, quit when positive, continue otherwise.
>
> --- ruga
>
>
>
> On Fri, Mar 1, 2019 at 22:31, Grant Taylor <gtaylor@tnetconsulting.net 
> <ma...@tnetconsulting.net>> wrote:
>> On 02/28/2019 09:39 PM, Mike Marynowski wrote:
>> > I modified it so it checks the root domain and all subdomains up to the
>> > email domain.
>>
>> :-)
>>
>> > As for your question - if afraid.org has a website then you are 
>> correct,
>> > all subdomains of afraid.org will not flag this rule, but if lots of
>> > afraid.org subdomains are sending spam then I imagine other spam
>> > detection methods will have a good chance of catching it.
>>
>> ACK
>>
>> afraid.org is much like DynDNS in that one entity (afaid.org themselves
>> or DynDNS) provide DNS services for other entities.
>>
>> I don't see a good way to differentiate between the sets of entities.
>>
>> > I'm not sure what you mean by "working up the tree" - if afraid.org has
>> > a website and I work my way up the tree then either way eventually I'll
>> > hit afraid.org and get a valid website, no?
>>
>> True.
>>
>> I wonder if there is any value in detecting zone boundaries via not
>> going any higher up the tree past the zone that's containing the email
>> domain(s).
>>
>> Perhaps something like that would enable differentiation between Afraid
>> & DynDNS and the entities that they are hosting DNS services for.
>> (Assuming that there are separate zones.
>>
>> > My current implementation fires off concurrent HTTP requests to the 
>> root
>> > domain and all subdomains up to the email domain and waits for a valid
>> > answer from any of them.
>>
>> ACK
>>
>> s/up to/down to/
>>
>> I don't grok the value of doing this as well as you do. But I think
>> your use case is enough different than mine such that I can't make an
>> objective value estimate.
>>
>> That being said, I do find the idea technically interesting, even if I
>> think I'll not utilize it.
>>
>>
>>
>> --
>> Grant. . . .
>> unix || die
>>
>
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Rupert Gallagher <ru...@protonmail.com>.

Case study:

example.com bans any e-mail sent from its third levels up, and does it by spf.

spf-banned.example.com sent mail, and my SA at server.com adds a big fat penalty, high enough to bounch it.

Suppose I do not bounch it, and use your filter to check for its websites. It turns out that both example.com and spf-banned.example.com have a website. Was it worth it to spend cycles on it? I guess not. The spf is an accepted rfc and it should have priority. So, I recommend the website test to first read the result of the SPF test, quit when positive, continue otherwise.

--- ruga

On Fri, Mar 1, 2019 at 22:31, Grant Taylor <gt...@tnetconsulting.net> wrote:

> On 02/28/2019 09:39 PM, Mike Marynowski wrote:
>> I modified it so it checks the root domain and all subdomains up to the
>> email domain.
>
> :-)
>
>> As for your question - if afraid.org has a website then you are correct,
>> all subdomains of afraid.org will not flag this rule, but if lots of
>> afraid.org subdomains are sending spam then I imagine other spam
>> detection methods will have a good chance of catching it.
>
> ACK
>
> afraid.org is much like DynDNS in that one entity (afaid.org themselves
> or DynDNS) provide DNS services for other entities.
>
> I don't see a good way to differentiate between the sets of entities.
>
>> I'm not sure what you mean by "working up the tree" - if afraid.org has
>> a website and I work my way up the tree then either way eventually I'll
>> hit afraid.org and get a valid website, no?
>
> True.
>
> I wonder if there is any value in detecting zone boundaries via not
> going any higher up the tree past the zone that's containing the email
> domain(s).
>
> Perhaps something like that would enable differentiation between Afraid
> & DynDNS and the entities that they are hosting DNS services for.
> (Assuming that there are separate zones.
>
>> My current implementation fires off concurrent HTTP requests to the root
>> domain and all subdomains up to the email domain and waits for a valid
>> answer from any of them.
>
> ACK
>
> s/up to/down to/
>
> I don't grok the value of doing this as well as you do. But I think
> your use case is enough different than mine such that I can't make an
> objective value estimate.
>
> That being said, I do find the idea technically interesting, even if I
> think I'll not utilize it.
>
> --
> Grant. . . .
> unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

On 3/1/2019 4:31 PM, Grant Taylor wrote:
> afraid.org is much like DynDNS in that one entity (afaid.org 
> themselves or DynDNS) provide DNS services for other entities.
>
> I don't see a good way to differentiate between the sets of entities.

I haven't come across any notable amount of spam that's punched through 
all the other detection methods in place with a reply-to/from email 
address subdomain on a service like that. I'm sure it happens though and 
in that case this filter simply won't add any value.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 02/28/2019 09:39 PM, Mike Marynowski wrote:
> I modified it so it checks the root domain and all subdomains up to the 
> email domain.

:-)

> As for your question - if afraid.org has a website then you are correct, 
> all subdomains of afraid.org will not flag this rule, but if lots of 
> afraid.org subdomains are sending spam then I imagine other spam 
> detection methods will have a good chance of catching it.

ACK

afraid.org is much like DynDNS in that one entity (afaid.org themselves 
or DynDNS) provide DNS services for other entities.

I don't see a good way to differentiate between the sets of entities.

> I'm not sure what you mean by "working up the tree" - if afraid.org has 
> a website and I work my way up the tree then either way eventually I'll 
> hit afraid.org and get a valid website, no?

True.

I wonder if there is any value in detecting zone boundaries via not 
going any higher up the tree past the zone that's containing the email 
domain(s).

Perhaps something like that would enable differentiation between Afraid 
& DynDNS and the entities that they are hosting DNS services for. 
(Assuming that there are separate zones.

> My current implementation fires off concurrent HTTP requests to the root 
> domain and all subdomains up to the email domain and waits for a valid 
> answer from any of them.

ACK

s/up to/down to/

I don't grok the value of doing this as well as you do.  But I think 
your use case is enough different than mine such that I can't make an 
objective value estimate.

That being said, I do find the idea technically interesting, even if I 
think I'll not utilize it.

-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

I modified it so it checks the root domain and all subdomains up to the 
email domain.

As for your question - if afraid.org has a website then you are correct, 
all subdomains of afraid.org will not flag this rule, but if lots of 
afraid.org subdomains are sending spam then I imagine other spam 
detection methods will have a good chance of catching it.

I'm not sure what you mean by "working up the tree" - if afraid.org has 
a website and I work my way up the tree then either way eventually I'll 
hit afraid.org and get a valid website, no?

My current implementation fires off concurrent HTTP requests to the root 
domain and all subdomains up to the email domain and waits for a valid 
answer from any of them.

On 2/28/2019 10:27 PM, Grant Taylor wrote:
> What about domains that have many client subdomains?
>
> afraid.org (et al) come to mind.
>
> You might end up allowing email from spammer.afraid.org who doesn't 
> have a website because the parent afraid.org does have a website.
>
> I would think that checking from the child and working up the tree 
> would be more accurate, even if it may take longer.
>
>
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 2/28/19 12:33 PM, Mike Marynowski wrote:
> This method checks the *root* domain, not the subdomain.

What about domains that have many client subdomains?

afraid.org (et al) come to mind.

You might end up allowing email from spammer.afraid.org who doesn't have 
a website because the parent afraid.org does have a website.

I would think that checking from the child and working up the tree would 
be more accurate, even if it may take longer.



-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 14:33, Mike Marynowski wrote:

> But scconsult.com does in fact have a website so I'm not sure what you 
> mean. This method checks the *root* domain, not the subdomain.

Ah, I see. I had missed that detail.

That's likely to have fewer issues, as long as you get the registry 
boundary correct. SA actually helps with that: see 
Mail::SpamAssassin::RegistryBoundaries.

> Even if this wasn't the case well, it is what it is. Emails from this 
> mailing list (and most well configured lists) come in at a spam score 
> of -6, so they are no risk of being blocked even if a non-website 
> domain triggers this particular rule.
>
> On 2/28/2019 2:25 PM, Bill Cole wrote:
>> On 28 Feb 2019, at 13:43, Mike Marynowski wrote:
>>
>>> On 2/28/2019 12:41 PM, Bill Cole wrote:
>>>> You should probably put the envelope sender (i.e. the SA 
>>>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. 
>>>> That will make many messages sent via discussion mailing lists 
>>>> (such as this one) pass your test where a test of real header 
>>>> domains would fail, while it it is more likely to cause commercial 
>>>> bulk mail to fail where it would usually pass based on real 
>>>> standard headers. (That's based on a hunch, not testing.)
>>> Can you clarify why you think my currently proposed headers would 
>>> fail with the mailing list? As far as I can tell, all the messages 
>>> I've received from this mailing list would pass just fine. As an 
>>> example from the emails in this list, which header value 
>>> specifically would cause it to fail?
>>
>> If I did not explicitly set the Reply-To header, this message would 
>> be delivered without one. The domain part of the From header on 
>> messages I post to this and other mailing lists has no website and 
>> never will.
>>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

But scconsult.com does in fact have a website so I'm not sure what you 
mean. This method checks the *root* domain, not the subdomain.

Even if this wasn't the case well, it is what it is. Emails from this 
mailing list (and most well configured lists) come in at a spam score of 
-6, so they are no risk of being blocked even if a non-website domain 
triggers this particular rule.

On 2/28/2019 2:25 PM, Bill Cole wrote:
> On 28 Feb 2019, at 13:43, Mike Marynowski wrote:
>
>> On 2/28/2019 12:41 PM, Bill Cole wrote:
>>> You should probably put the envelope sender (i.e. the SA 
>>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
>>> will make many messages sent via discussion mailing lists (such as 
>>> this one) pass your test where a test of real header domains would 
>>> fail, while it it is more likely to cause commercial bulk mail to 
>>> fail where it would usually pass based on real standard headers. 
>>> (That's based on a hunch, not testing.)
>> Can you clarify why you think my currently proposed headers would 
>> fail with the mailing list? As far as I can tell, all the messages 
>> I've received from this mailing list would pass just fine. As an 
>> example from the emails in this list, which header value specifically 
>> would cause it to fail?
>
> If I did not explicitly set the Reply-To header, this message would be 
> delivered without one. The domain part of the From header on messages 
> I post to this and other mailing lists has no website and never will.
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 13:43, Mike Marynowski wrote:

> On 2/28/2019 12:41 PM, Bill Cole wrote:
>> You should probably put the envelope sender (i.e. the SA 
>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
>> will make many messages sent via discussion mailing lists (such as 
>> this one) pass your test where a test of real header domains would 
>> fail, while it it is more likely to cause commercial bulk mail to 
>> fail where it would usually pass based on real standard headers. 
>> (That's based on a hunch, not testing.)
> Can you clarify why you think my currently proposed headers would fail 
> with the mailing list? As far as I can tell, all the messages I've 
> received from this mailing list would pass just fine. As an example 
> from the emails in this list, which header value specifically would 
> cause it to fail?

If I did not explicitly set the Reply-To header, this message would be 
delivered without one. The domain part of the From header on messages I 
post to this and other mailing lists has no website and never will.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Available For Hire: https://linkedin.com/in/billcole

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

On 2/28/2019 12:41 PM, Bill Cole wrote:
> You should probably put the envelope sender (i.e. the SA 
> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
> will make many messages sent via discussion mailing lists (such as 
> this one) pass your test where a test of real header domains would 
> fail, while it it is more likely to cause commercial bulk mail to fail 
> where it would usually pass based on real standard headers. (That's 
> based on a hunch, not testing.)
Can you clarify why you think my currently proposed headers would fail 
with the mailing list? As far as I can tell, all the messages I've 
received from this mailing list would pass just fine. As an example from 
the emails in this list, which header value specifically would cause it 
to fail?

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Rupert Gallagher <ru...@protonmail.com>.

The focus was on the To header for mailing lists, complaints on MUAs and people's choices. If you do not want to appear in the To header of a list, you are exercising a legal right under the GDPR. So, to cut through all those problems and enforce a sound solution, I suggest list majordomos do the compliance heavy lifting by forcing a sane To header. That's all. If you want to talk more in general about GDPR, I do it everyday, so leave me alone on weekends, will you? :-)

On Fri, Mar 1, 2019 at 22:41, Grant Taylor <gt...@tnetconsulting.net> wrote:

> On 03/01/2019 01:25 AM, Rupert Gallagher wrote:
>> A future-proof list that complies with GDPR would automatically rewrite
>> the To header, leaving the list address only.
>
> Doesn't GDPR also include things like signatures? Thus if the mailing
> list is only modifying the email metadata and not the message body (thus
> signature), then it's still subject to GDPR.
>
> I also feel like it is a disservice to the mailing list to hide who the
> message is from. But I have no idea of the legalities of (not) doing such.
>
>> Any other recipient will still receive it from the original sender.
>
> I presume you're talking about (B)CC and additional To recipients.
>
> I never did hear, how does GDPR play out in such a scenario. Does the
> sender need to make a request to all To / (B)CC recipients for them to
> forget the sender? Also, does the mailing list operator have any
> responsibility to pass the request on to all subscribers to purge the
> requester from their personal archives? I feel like there's a LOT of
> unaddressed issues here, and that singling out the mailing list is
> somewhat unfair. But life's unfair. So … ¯_(ツ)_/¯
>
> --
> Grant. . . .
> unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 03/01/2019 01:25 AM, Rupert Gallagher wrote:
> A future-proof list that complies with GDPR would automatically rewrite 
> the To header, leaving the list address only.

Doesn't GDPR also include things like signatures?  Thus if the mailing 
list is only modifying the email metadata and not the message body (thus 
signature), then it's still subject to GDPR.

I also feel like it is a disservice to the mailing list to hide who the 
message is from.  But I have no idea of the legalities of (not) doing such.

> Any other recipient will still receive it from the original sender.

I presume you're talking about (B)CC and additional To recipients.

I never did hear, how does GDPR play out in such a scenario.  Does the 
sender need to make a request to all To / (B)CC recipients for them to 
forget the sender?  Also, does the mailing list operator have any 
responsibility to pass the request on to all subscribers to purge the 
requester from their personal archives?  I feel like there's a LOT of 
unaddressed issues here, and that singling out the mailing list is 
somewhat unfair.  But life's unfair.  So … ¯\_(ツ)_/¯

-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Rupert Gallagher <ru...@protonmail.com>.

A future-proof list that complies with GDPR would automatically rewrite the To header, leaving the list address only. Any other recipient will still receive it from the original sender.

On Thu, Feb 28, 2019 at 20:29, Mike Marynowski <mi...@singulink.com> wrote:

> Unfortunately I don't see a reply-to header on your messages. What do
> you have it set to? I thought mailing lists see who is in the "to"
> section of a reply so that 2 copies aren't sent out. The "mailing list
> ethics" guide I read said to always use "reply all" and the mailing list
> system takes care of not sending duplicate replies.
>
> I removed your direct email from this reply and only kept the mailing
> list address, but for the record I don't see any reply-to headers.
>
> On 2/28/2019 2:21 PM, Bill Cole wrote:
>> Please respect my consciously set Reply-To header. I don't ever need 2
>> copies of a message posted to a mailing list, and ignoring that header
>> is rude.
>>
>> On 28 Feb 2019, at 13:28, Mike Marynowski wrote:
>>
>>> On 2/28/2019 12:41 PM, Bill Cole wrote:
>>>> You should probably put the envelope sender (i.e. the SA
>>>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That
>>>> will make many messages sent via discussion mailing lists (such as
>>>> this one) pass your test where a test of real header domains would
>>>> fail, while it it is more likely to cause commercial bulk mail to
>>>> fail where it would usually pass based on real standard headers.
>>>> (That's based on a hunch, not testing.)
>>>
>>> Hmmm. I'll have to give some more thought into the exact headers it
>>> decides to test. I'm not sure if my MTA puts in envelope info into
>>> the SA request or not. For sake of simplicity right now I might just
>>> ignore mailing lists, I don't know. What I do know is that in the
>>> spam messages I'm reviewing right now, the reply-to / from headers
>>> set often don't have websites at those domains and none of them are
>>> masquerading as mailing lists. I haven't thought through the
>>> situation with mailing lists yet.
>>>
>>> I'm new to this whole SA plugin dev process - can you suggest the
>>> best way to log the full requests that SA receives so I can see what
>>> info it is getting and what I have to work with?
>>
>> The best way to see far too much information about what SA is doing is
>> to add a "-D all" to the invocation of the spamassassin script. You
>> can also add that to the flags used by spamd, if you want to punish
>> your logging subsystem
>>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 21:10, Mike Marynowski wrote:

> Thunderbird normally shows reply-to in normal messages...is this 
> something that some MUAs ignore just on mailing list emails or all 
> emails?

I cannot keep track all of the irrational things done by all MUAs. I'm 
not even surprised by anything new and wrong I see any more.

It is certainly POSSIBLE that because *some* mailing lists impose a 
fixed Reply-To, some MUAs ignore it on list messages. It is certain that 
some just ignore it no matter what because they are written by fools. 
Also, some people configure their MUA to "reply all" by default all the 
time, overriding any normal Reply-To support.

> Because I see reply-to on plenty of other emails.

You can see it for yourself in the raw message at the list archive: 
http://mail-archives.apache.org/mod_mbox/spamassassin-users/201902.mbox/raw/%3cE54C4C01-CFCC-430B-8A52-6D8D866DB038@billmail.scconsult.com%3e

>
> On 2/28/2019 3:44 PM, Bill Cole wrote:
>> On 28 Feb 2019, at 14:29, Mike Marynowski wrote:
>>
>>> Unfortunately I don't see a reply-to header on your messages. What 
>>> do you have it set to? I thought mailing lists see who is in the 
>>> "to" section of a reply so that 2 copies aren't sent out. The 
>>> "mailing list ethics" guide I read said to always use "reply all" 
>>> and the mailing list system takes care of not sending duplicate 
>>> replies.
>>>
>>> I removed your direct email from this reply and only kept the 
>>> mailing list address, but for the record I don't see any reply-to 
>>> headers.
>>
>> But it's right there in the copy that the list delivered to me:
>>
>>     From: "Bill Cole" <sa...@billmail.scconsult.com>
>>     To: users@spamassassin.apache.org
>>     Subject: Re: Spam rule for HTTP/HTTPS request to sender's 
>> root domain
>>     Date: Thu, 28 Feb 2019 14:21:41 -0500
>>     Reply-To: users@spamassassin.apache.org
>>
>> Whether you see it is a function of how your MUA (TBird, it seems... 
>> ) displays messages. Unfortunately, it has become common for MUAs 
>> simply ignore Reply-To. I didn't think TBird was in that class.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Thunderbird normally shows reply-to in normal messages...is this 
something that some MUAs ignore just on mailing list emails or all 
emails? Because I see reply-to on plenty of other emails.

On 2/28/2019 3:44 PM, Bill Cole wrote:
> On 28 Feb 2019, at 14:29, Mike Marynowski wrote:
>
>> Unfortunately I don't see a reply-to header on your messages. What do 
>> you have it set to? I thought mailing lists see who is in the "to" 
>> section of a reply so that 2 copies aren't sent out. The "mailing 
>> list ethics" guide I read said to always use "reply all" and the 
>> mailing list system takes care of not sending duplicate replies.
>>
>> I removed your direct email from this reply and only kept the mailing 
>> list address, but for the record I don't see any reply-to headers.
>
> But it's right there in the copy that the list delivered to me:
>
>     From: "Bill Cole" <sa...@billmail.scconsult.com>
>     To: users@spamassassin.apache.org
>     Subject: Re: Spam rule for HTTP/HTTPS request to sender's root domain
>     Date: Thu, 28 Feb 2019 14:21:41 -0500
>     Reply-To: users@spamassassin.apache.org
>
> Whether you see it is a function of how your MUA (TBird, it seems... ) 
> displays messages. Unfortunately, it has become common for MUAs simply 
> ignore Reply-To. I didn't think TBird was in that class.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 14:29, Mike Marynowski wrote:

> Unfortunately I don't see a reply-to header on your messages. What do 
> you have it set to? I thought mailing lists see who is in the "to" 
> section of a reply so that 2 copies aren't sent out. The "mailing list 
> ethics" guide I read said to always use "reply all" and the mailing 
> list system takes care of not sending duplicate replies.
>
> I removed your direct email from this reply and only kept the mailing 
> list address, but for the record I don't see any reply-to headers.

But it's right there in the copy that the list delivered to me:

	From: "Bill Cole" <sa...@billmail.scconsult.com>
	To: users@spamassassin.apache.org
	Subject: Re: Spam rule for HTTP/HTTPS request to sender's root domain
	Date: Thu, 28 Feb 2019 14:21:41 -0500
	Reply-To: users@spamassassin.apache.org

Whether you see it is a function of how your MUA (TBird, it seems... ) 
displays messages. Unfortunately, it has become common for MUAs simply 
ignore Reply-To. I didn't think TBird was in that class.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Unfortunately I don't see a reply-to header on your messages. What do 
you have it set to? I thought mailing lists see who is in the "to" 
section of a reply so that 2 copies aren't sent out. The "mailing list 
ethics" guide I read said to always use "reply all" and the mailing list 
system takes care of not sending duplicate replies.

I removed your direct email from this reply and only kept the mailing 
list address, but for the record I don't see any reply-to headers.

On 2/28/2019 2:21 PM, Bill Cole wrote:
> Please respect my consciously set Reply-To header. I don't ever need 2 
> copies of a message posted to a mailing list, and ignoring that header 
> is rude.
>
> On 28 Feb 2019, at 13:28, Mike Marynowski wrote:
>
>> On 2/28/2019 12:41 PM, Bill Cole wrote:
>>> You should probably put the envelope sender (i.e. the SA 
>>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
>>> will make many messages sent via discussion mailing lists (such as 
>>> this one) pass your test where a test of real header domains would 
>>> fail, while it it is more likely to cause commercial bulk mail to 
>>> fail where it would usually pass based on real standard headers. 
>>> (That's based on a hunch, not testing.)
>>
>> Hmmm. I'll have to give some more thought into the exact headers it 
>> decides to test. I'm not sure if my MTA puts in envelope info into 
>> the SA request or not. For sake of simplicity right now I might just 
>> ignore mailing lists, I don't know. What I do know is that in the 
>> spam messages I'm reviewing right now, the reply-to / from headers 
>> set often don't have websites at those domains and none of them are 
>> masquerading as mailing lists. I haven't thought through the 
>> situation with mailing lists yet.
>>
>> I'm new to this whole SA plugin dev process - can you suggest the 
>> best way to log the full requests that SA receives so I can see what 
>> info it is getting and what I have to work with?
>
> The best way to see far too much information about what SA is doing is 
> to add a "-D all" to the invocation of the spamassassin script. You 
> can also add that to the flags used by spamd, if you want to punish 
> your logging subsystem
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

Please respect my consciously set Reply-To header. I don't ever need 2 
copies of a message posted to a mailing list, and ignoring that header 
is rude.

On 28 Feb 2019, at 13:28, Mike Marynowski wrote:

> On 2/28/2019 12:41 PM, Bill Cole wrote:
>> You should probably put the envelope sender (i.e. the SA 
>> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
>> will make many messages sent via discussion mailing lists (such as 
>> this one) pass your test where a test of real header domains would 
>> fail, while it it is more likely to cause commercial bulk mail to 
>> fail where it would usually pass based on real standard headers. 
>> (That's based on a hunch, not testing.)
>
> Hmmm. I'll have to give some more thought into the exact headers it 
> decides to test. I'm not sure if my MTA puts in envelope info into the 
> SA request or not. For sake of simplicity right now I might just 
> ignore mailing lists, I don't know. What I do know is that in the spam 
> messages I'm reviewing right now, the reply-to / from headers set 
> often don't have websites at those domains and none of them are 
> masquerading as mailing lists. I haven't thought through the situation 
> with mailing lists yet.
>
> I'm new to this whole SA plugin dev process - can you suggest the best 
> way to log the full requests that SA receives so I can see what info 
> it is getting and what I have to work with?

The best way to see far too much information about what SA is doing is 
to add a "-D all" to the invocation of the spamassassin script. You can 
also add that to the flags used by spamd, if you want to punish your 
logging subsystem

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

On 2/28/2019 12:41 PM, Bill Cole wrote:
> You should probably put the envelope sender (i.e. the SA 
> "EnvelopeFrom" pseudo-header) into that list, maybe even first. That 
> will make many messages sent via discussion mailing lists (such as 
> this one) pass your test where a test of real header domains would 
> fail, while it it is more likely to cause commercial bulk mail to fail 
> where it would usually pass based on real standard headers. (That's 
> based on a hunch, not testing.)

Hmmm. I'll have to give some more thought into the exact headers it 
decides to test. I'm not sure if my MTA puts in envelope info into the 
SA request or not. For sake of simplicity right now I might just ignore 
mailing lists, I don't know. What I do know is that in the spam messages 
I'm reviewing right now, the reply-to / from headers set often don't 
have websites at those domains and none of them are masquerading as 
mailing lists. I haven't thought through the situation with mailing 
lists yet.

I'm new to this whole SA plugin dev process - can you suggest the best 
way to log the full requests that SA receives so I can see what info it 
is getting and what I have to work with?

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Bill Cole <sa...@billmail.scconsult.com>.

On 28 Feb 2019, at 11:33, Mike Marynowski wrote:

> Question though - what is your reply-to address set to in the emails 
> coming from your email-only domain?

I can't answer for Ralph, but in my case I use a mail-only domain in 
 From for most of my personal mail, and while I usually set Reply-To to 
list submission addresses when posting to a mailing list (because some 
mail clients honor it...) I NEVER have a Reply-To on non-list mail. For 
mailing lists I administer, aside from targeted DMARC workarounds 
effecting a small subset of members, there is also no Reply-To forced. 
Users can set it as they like. Note that for mailing lists, the From 
header domain normally doesn't match the envelope sender domain.

> The domain checking I'm doing grabs the first available address in 
> this order: reply-to, from, sender. It's not using the domain of the 
> SMTP server. I did come across some email-only domain SENDERS in my 
> sampling, but the overwhelming majority of reply-to addresses pointed 
> to emails with HTTP servers on their domains.

You should probably put the envelope sender (i.e. the SA "EnvelopeFrom" 
pseudo-header) into that list, maybe even first. That will make many 
messages sent via discussion mailing lists (such as this one) pass your 
test where a test of real header domains would fail, while it it is more 
likely to cause commercial bulk mail to fail where it would usually pass 
based on real standard headers. (That's based on a hunch, not testing.)

> On 2/28/2019 11:14 AM, Ralph Seichter wrote:
>> * Grant Taylor:
>>
>>> Why would you do it per email? I would think that you would do the
>>> test and cache the results for some amount of time.
>> I would not do it at all, caching or no caching. Personally, I don't 
>> see
>> a benefit trying to correlate email with a website, as mentioned 
>> before,
>> based on how we utilise email-only-domains.
>>
>> -Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> Question though - what is your reply-to address set to in the emails
> coming from your email-only domain?

We very rarely inject Reply-To, because this might interfere with what
the original sender intended.

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

You'll be able to decide how you want to prioritize the fields - I've 
implemented it as a DNS server, so which domain you decide to send to 
the DNS server is entirely up to you.

On 2/28/2019 10:23 PM, Grant Taylor wrote:
> On 2/28/19 9:33 AM, Mike Marynowski wrote:
>> I'm doing grabs the first available address in this order: reply-to, 
>> from, sender.
>
> That sounds like it might be possible to game things by playing with 
> the order.
>
> I'm not sure what sorts of validations are applied to the Sender: 
> header.  (I don't remember if DMARC checks the Sender: header or not.)
>
> How would your filter respond if the MAIL FROM: and the From: header 
> were set to something that didn't have a website, yet had a Sender: 
> header with <something>@gmail.com listed before the Reply-To: and 
> From: headers?
>
>
>

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 2/28/19 9:33 AM, Mike Marynowski wrote:
> I'm doing grabs the first available address in this order: reply-to, 
> from, sender.

That sounds like it might be possible to game things by playing with the 
order.

I'm not sure what sorts of validations are applied to the Sender: 
header.  (I don't remember if DMARC checks the Sender: header or not.)

How would your filter respond if the MAIL FROM: and the From: header 
were set to something that didn't have a website, yet had a Sender: 
header with <something>@gmail.com listed before the Reply-To: and From: 
headers?

-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Question though - what is your reply-to address set to in the emails 
coming from your email-only domain?

The domain checking I'm doing grabs the first available address in this 
order: reply-to, from, sender. It's not using the domain of the SMTP 
server. I did come across some email-only domain SENDERS in my sampling, 
but the overwhelming majority of reply-to addresses pointed to emails 
with HTTP servers on their domains.

On 2/28/2019 11:14 AM, Ralph Seichter wrote:
> * Grant Taylor:
>
>> Why would you do it per email? I would think that you would do the
>> test and cache the results for some amount of time.
> I would not do it at all, caching or no caching. Personally, I don't see
> a benefit trying to correlate email with a website, as mentioned before,
> based on how we utilise email-only-domains.
>
> -Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Grant Taylor:

> Why would you do it per email? I would think that you would do the
> test and cache the results for some amount of time.

I would not do it at all, caching or no caching. Personally, I don't see
a benefit trying to correlate email with a website, as mentioned before,
based on how we utilise email-only-domains.

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Grant Taylor <gt...@tnetconsulting.net>.

On 02/27/2019 03:25 PM, Ralph Seichter wrote:
> We use some of our domains specifically for email, with no associated 
> website.

I agree that /requiring/ a website at one of the parent domains 
(stopping before traversing into the Public Suffix List) is problematic 
and prone to false positives.

There /may/ be some value to /some/ people in doing such a check and 
altering the spam score.  (See below.)

> Besides, I think the overhead to establish a HTTPS connection for 
> every incoming email would be prohibitive.

Why would you do it per email?  I would think that you would do the test 
and cache the results for some amount of time.

> There is a reason most whitelist/blacklist services use "cheap" DNS 
> queries instead.
I wonder if there is a way to hack DNS into doing this for us.  I.e. a 
custom DNS ""server (BIND's DLZ comes to mind) that can perform the 
test(s) and fabricate an answer that could then be cached.  ""Publish 
these answers in a new zone / domain name, and treat it like another RBL.

Meaning a query goes to the new RBL server, which does the necessary 
$MAGIC to return an answer (possibly NXDOMAIN if there is a site and 
127.0.0.1 if there is no site) which can be cached by standard local / 
recursive DNS servers.

-- 
Grant. . . .
unix || die

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Andrea Venturoli <ml...@netfence.it>.

On 2/28/19 3:40 PM, Mike Marynowski wrote:

> Right now the test plugin I've built makes a single HTTP request for 
> each email while I evaluate this but I'll be building a DNS query 
> endpoint or a local domain cache to make it more efficient before 
> putting it into production.

Please keep us updated: I love the idea.

  bye & Thanks
	av.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Just one more note - I've excluded .email domains from the check as I've 
noticed several organizations using that as email only domains.

Right now the test plugin I've built makes a single HTTP request for 
each email while I evaluate this but I'll be building a DNS query 
endpoint or a local domain cache to make it more efficient before 
putting it into production.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Ralph Seichter <ab...@monksofcool.net>.

* Mike Marynowski:

> Of the 100 last legitimate email domains that have sent me mail, 100%
> of them have working websites at the root domain.

We use some of our domains specifically for email, with no associated
website. Besides, I think the overhead to establish a HTTPS connection
for every incoming email would be prohibitive. There is a reason most
whitelist/blacklist services use "cheap" DNS queries instead.

-Ralph

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Thank you! I have no idea how I missed that...

On 3/13/2019 7:11 PM, RW wrote:
> On Wed, 13 Mar 2019 17:40:57 -0400
> Mike Marynowski wrote:
>
>> Can someone help me form the correct SOA record in my DNS responses
>> to ensure the NXDOMAIN responses get cached properly? Based on the
>> logs I don't think downstream DNS servers are caching it as requests
>> for the same valid HTTP domains keep hitting the service instead of
>> being cached for 4 days.
> ...
>> Based on random sampling of responses from other DNS servers this
>> seems correct to me. Nothing I'm reading indicates that TTL factors
>> into the negative caching but is it possible servers are only caching
>> the negative response for 15 mins because of the TTL on the SOA
>> record, using the smaller value between that and the default TTL?
> I believe so, from RFC 2308:
>
> 3 - Negative Answers from Authoritative Servers
>
>     Name servers authoritative for a zone MUST include the SOA record of
>     the zone in the authority section of the response when reporting an
>     NXDOMAIN or indicating that no data of the requested type exists.
>     This is required so that the response may be cached.  The TTL of this
>     record is set from the minimum of the MINIMUM field of the SOA record
>     and the TTL of the SOA itself, and indicates how long a resolver may
>     cache the negative answer.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by RW <rw...@googlemail.com>.

On Wed, 13 Mar 2019 17:40:57 -0400
Mike Marynowski wrote:

> Can someone help me form the correct SOA record in my DNS responses
> to ensure the NXDOMAIN responses get cached properly? Based on the
> logs I don't think downstream DNS servers are caching it as requests
> for the same valid HTTP domains keep hitting the service instead of
> being cached for 4 days.
...
> Based on random sampling of responses from other DNS servers this
> seems correct to me. Nothing I'm reading indicates that TTL factors
> into the negative caching but is it possible servers are only caching
> the negative response for 15 mins because of the TTL on the SOA
> record, using the smaller value between that and the default TTL?

I believe so, from RFC 2308:

3 - Negative Answers from Authoritative Servers

   Name servers authoritative for a zone MUST include the SOA record of
   the zone in the authority section of the response when reporting an
   NXDOMAIN or indicating that no data of the requested type exists.
   This is required so that the response may be cached.  The TTL of this
   record is set from the minimum of the MINIMUM field of the SOA record
   and the TTL of the SOA itself, and indicates how long a resolver may
   cache the negative answer.

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Can someone help me form the correct SOA record in my DNS responses to 
ensure the NXDOMAIN responses get cached properly? Based on the logs I 
don't think downstream DNS servers are caching it as requests for the 
same valid HTTP domains keep hitting the service instead of being cached 
for 4 days.

 From what I understand, if you want to cache an NXDOMAIN response then 
you need to include an SOA record with the response and DNS servers 
should use the min/default TTL value as a negative cache hint. My 
NXDOMAIN responses currently look like this:

     HEADER:
         opcode = QUERY, id = 27, rcode = NXDOMAIN
         header flags:  response, want recursion, recursion avail.
         questions = 1,  answers = 0,  authority records = 1, additional = 0

     QUESTIONS:
         www.singulink.com.httpcheck.singulink.com, type = A, class = IN
     AUTHORITY RECORDS:
     ->  httpcheck.singulink.com
         ttl = 900 (15 mins)
         primary name server = httpcheck.singulink.com
         responsible mail addr = admin.singulink.com
         serial  = 4212294798
         refresh = 172800 (2 days)
         retry   = 86400 (1 day)
         expire  = 2592000 (30 days)
         default TTL = 345600 (4 days)

Based on random sampling of responses from other DNS servers this seems 
correct to me. Nothing I'm reading indicates that TTL factors into the 
negative caching but is it possible servers are only caching the 
negative response for 15 mins because of the TTL on the SOA record, 
using the smaller value between that and the default TTL?

Re: Spam rule for HTTP/HTTPS request to sender's root domain

Posted by Mike Marynowski <mi...@singulink.com>.

Changing up the algorithm a bit. Once a domain has been added to the 
cache, the DNS service will perform HTTP checks in the background 
automatically on a much more aggressive schedule for invalid domains so 
that temporary website problems are much less of an issue and invalid 
domains don't delay mail delivery threads for up to 15s after TTL 
expirations during the initial test period with progressively increasing 
TTLs - queries can always return instantly after the first one, as long 
as the domain has been queried in the last 30 days and is still in cache.

Domains deemed to have "invalid" websites will be rechecked much more 
aggressively in the background to ensure newly queried domains with 
temporary website issues stop tripping this filter as soon as possible. 
There will be a "sliding window" of a few days where temporary website 
issues during the window won't cause the filter to trip, it just needs 
to provide a valid response sometime during the sliding window to stay 
in good standing.