You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@httpd.apache.org by Marc <Ma...@f1-outsourcing.eu> on 2023/09/12 14:31:58 UTC

[users@httpd] realtime protection against cloud scans

Anyone having a suggestion on how to block cloud crawlers/bots? Obviously I would like search engine bots to have access, but all the other crap I want to lose. Only 'real users'.

What is best practice for this? Just getting amazon, googleusercontent, digitalocean, azure ip ranges and put them in something like ipset or are there currently better ways of doing this?

Re: [users@httpd] realtime protection against cloud scans

Posted by me...@newjersey.metaed.com.INVALID.

Marc wrote:
> I still need to get familiar with nft. Currently I am using ipset

NFT has an equivalent -- also called a set. Here are excerpts from my
configuration that show how addresses and ranges appear in a set and how a set
is blocked.

Defining the set of real-time intrusions:

	set SET_IPV4_MAIN_TEMPBLOCK {
	type ipv4_addr
	flags timeout
	elements = { 1.0.171.2, 1.1.110.108, [...], 223.255.161.190 }
	}

Dropping traffic that matches the set:

	chain CHAIN_INET_MAIN_INBOUND {
	type filter hook input priority 0; policy drop;
	ip saddr @SET_IPV4_MAIN_TEMPBLOCK drop
	[...]
	}

Defining a set of geolocated address blocks:

	set SET_GEO_IPV4_RU {
	type ipv4_addr
	flags interval
	elements = { 2.16.20.0/23, [...], 217.199.236.0-217.199.254.255 }
	}

Dropping traffic that matches such a set:

	chain CHAIN_GEO_IPV4 {
	type filter hook input priority -300; policy accept;
	[...]
	ip saddr @SET_GEO_IPV4_CN drop
	[...]
	ip saddr @SET_GEO_IPV4_RU drop
	[...]
	}

The configuration for IPv6 is analogous.

Marc wrote:
> I am looking for something that can do this automatically.

I have all this scripted and scheduled. It's hands-off, except that I look at
reports from time to time, to see if there is a new intrusion pattern I should
be detecting.

Marc wrote:
> Afaik was ipset very good with latency. I have no idea how this is replaced.

According to what I have read, NFT beats the IP firewall in benchmarks. But I
have not tested it at scale. Personally I can only say the NFT filter is much
faster than I need it to be. The amount of CPU it costs is so small I cannot
measure it. So I do not see it contributing measurably to latency.

-- 
Cheers!
Edward

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] realtime protection against cloud scans

Posted by Marc <Ma...@f1-outsourcing.eu>.

> > using the NTP firewall
> 
> Sorry, using the NFT firewall.
> 

I still need to get familiar with nft. Currently I am using ipset, adding ip's with scripts. But ipset is preconfigured for specific netmask /24 /X. So at some point your /24 is getting full with 65k entries. It would be nice if then automitcally /24 are merged/moved to ipsets bigger than /24. 

I am looking for something that can do this automatically. 

Currently I am thinking of creating multiple ipsets for /16 /18 /22 etc and I don't know if I should just put corresponding ranges in there form digitalocean, amazon, googleusercloud and azure. Or indeed go ips from abuse lists, but then risking that lots are not there and you are still adding slowly these clouds like digitalocean. 

Afaik was ipset very good with latency. I have no idea how this is replaced.

Re: [users@httpd] realtime protection against cloud scans

Posted by me...@newjersey.metaed.com.INVALID.

metaed borked:
> using the NTP firewall

Sorry, using the NFT firewall.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

Re: [users@httpd] realtime protection against cloud scans

Posted by me...@newjersey.metaed.com.INVALID.

Marc wrote:
> Anyone having a suggestion on how to block cloud crawlers/bots? Obviously I
> would like search engine bots to have access, but all the other crap I want to
> lose. Only 'real users'.

I take a three-pronged approach, using the NTP firewall and some scripts.

1. db-ip.com keeps a list of IP ranges by geocode, updated monthly. I Block
geocodes CN, VN, RU, HK, SG, IN, KR, TW, BR, JP, and ID. I arrived at this list
of geocodes based on observing where most intrusion attempts were actually
coming from. (US, DE, FR, and NL are also on my list, but blocking them would
interfere with intended use.)

2. blocklist.de keeps a list of malicious IPs, updated in near-real-time. I
block these as they appear.

3. I maintain my own list of log signatures that signal malice, such as "GET
/.env" in the Apache log, or signatures of crawlers that I do not sanction, or
probes of ports such as SSH, SMTP, and IMAP. I block the originating IPs as they
appear. About 50 intrusion attempts are blocked per day this way.

With these measures in place, nearly all my Apache traffic is intended use.

-- 
Cheers!
Edward

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org

RE: [users@httpd] realtime protection against cloud scans

Posted by Marc <Ma...@f1-outsourcing.eu>.

I would even state that >80% of your server load is crap, if you don't block any ranges. Besides that you open yourself up to vulnerability checks and monitoring for domain hijacking etc.

> 
> Does the traffic from those cloud ranges have any significant impact on
> your server performance?
> 
> On Tue, Sep 12, 2023 at 10:33 AM Marc <Marc@f1-outsourcing.eu
> <ma...@f1-outsourcing.eu> > wrote:
> 
> 
> 
> 	Anyone having a suggestion on how to block cloud crawlers/bots?
> Obviously I would like search engine bots to have access, but all the other
> crap I want to lose. Only 'real users'.
> 
> 	What is best practice for this? Just getting amazon,
> googleusercontent, digitalocean, azure ip ranges and put them in something
> like ipset or are there currently better ways of doing this?
> 
> 
>

Re: [users@httpd] realtime protection against cloud scans

Posted by Frank Gingras <th...@apache.org>.

Does the traffic from those cloud ranges have any significant impact on
your server performance?

On Tue, Sep 12, 2023 at 10:33 AM Marc <Ma...@f1-outsourcing.eu> wrote:

>
> Anyone having a suggestion on how to block cloud crawlers/bots? Obviously
> I would like search engine bots to have access, but all the other crap I
> want to lose. Only 'real users'.
>
> What is best practice for this? Just getting amazon, googleusercontent,
> digitalocean, azure ip ranges and put them in something like ipset or are
> there currently better ways of doing this?
>
>
>