You are viewing a plain text version of this content. The canonical link for it is here.
Posted to blogspam@spamassassin.apache.org by Bob Apthorpe <ap...@cynistar.net> on 2005/02/27 01:48:12 UTC

Proof-of-concept: log analyzer & blacklist generator

Hi,

It took about a week but I have the first rudiments of a web log 
analyzer and blacklist generator completed.

The design and operation documents are at

http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense

and

http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense_2fBlacklistPolicyImplementation

I've implemented the barest essentials of a policy (blacklist) 
management system. Here's how it works:

Webserver (Apache) logs are stored in a database (MySQL) using 
mod_log_sql or are uploaded manually using mysql_import_combined_log.pl, 
a contrib script from the mod_log_sql distribution.

Using load_working_set.php, you select webserver accesslog tables from 
within MySQL, set the range of data to analyze (up to 730 days ago to 
present), and import the records into the analysis database.

There's currently one tool for analyzing the traffic - 
scan_referers.php. This sorts referer hosts by frequency, dropping empty 
referers and those that match the virtual host (intrasite referers.) The 
hosts can be marked as one of four colors (white, gray, black, or beige) 
which correspond to welcome, untrusted, unwelcome, and unlisted. Referer 
hosts are grouped in blocks of 25 to keep editing and display managable.

New referer hosts (those not currently in the policy database) are 
marked as beige by default. Color changes are stored in the database 
along with a timestamp, allowing blacklists that "age out" old data.

Finally, once the referer hosts have been analyzed and assigned colors, 
one can generate blacklists in several formats, suitable for use with 
several software packages. First, there's the ubiquitous text file, 
which can be sorted alphabetically or by second-level domain. Next is 
the optimized perl regular expression (see Regexp::List for the 
insignificant amount of perl code this requires.) Finally, there's the 
rbldnsd zone file, so you can operate your very own private version of 
SURBL. rbldnsd can export zone files in BIND format using the -d flag, 
so it should be a simple matter of adding BIND-formatted blacklists.

For those brave enough to want to play with it, the first rev of the 
code is at:

http://www.austinimprov.com/~apthorpe/code/policy_manager/blacklist.tar.gz

In short, the system is functional but far from being complete; it's 
much more than a pipe dream but it needs a lot more work to reach its 
full potential as an easy-to-use blacklist management system.

What needs to be done:

- Refactor the code with an eye towards modularity and extensibility 
(OOD? The current code is procedural because that's what makes the most 
sense to me. It isn't necessarily the best way of doing things.)

- Create more and better analysis tools. Referer frequency is the tip of 
the iceberg - see Section 2 of 
http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense_2fBlacklistPolicyImplementation

- More and better policy management tools (escalating host -> domain 
listings, IP -> network listings, etc.), better time-stamping, case 
logging ("This domain was listed for <this reason> based on <this 
evidence>, signed <incident handler>")

- Clean up the UI (CSS)

- Integrate with popular blogs, CMSs, wikis, message boards/forums, etc. 
So many tools are based on MySQL, it should be easy to write filters and 
queries to import abuse evidence into the policy management system. 
Similarly, so many of these community tools can use DNSBLs, text lists, 
and regular expressions for blocking, it should be easy for interested 
parties to develop output filters to make use of the policy data.

This system should be be easy to install, maintain, and extend. I would 
like it released under an OSI-approved license. I'm not trying to 
replicate Snort or a bona-fide IDS. I'm trying to detect and deter 
normal activity from unwanted sources or of an excessive scale, not 
protect against DoS attacks and penetration attempts (contrast a mall 
cop with an actual policeman.)

I'm looking for a few developers to help me with this project, 
preferably someone comfortable with databases (MySQL & cross-db 
portability a plus), people comfortable with PHP and perl (I'm using 
Smarty templates because I like a clean separation between analysis and 
presentation), and someone who's really interested in reputation/trust 
management systems and anonymous secure P2P systems. No, really.

Contact me privately if you're interested or if you want to see the 
system demo.

Thanks much,

-- 
Bob Apthorpe
<ap...@cynistar.net>