You are viewing a plain text version of this content. The canonical link for it is here.

Posted to blogspam@spamassassin.apache.org by Bob Apthorpe <ap...@cynistar.net> on 2005/02/27 01:48:12 UTC

Proof-of-concept: log analyzer & blacklist generator

Hi,

It took about a week but I have the first rudiments of a web log
analyzer and blacklist generator completed.

The design and operation documents are at

http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense

and

http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense_2fBlacklistPolicyImplementation

I've implemented the barest essentials of a policy (blacklist)
management system. Here's how it works:

Webserver (Apache) logs are stored in a database (MySQL) using
mod_log_sql or are uploaded manually using mysql_import_combined_log.pl,
a contrib script from the mod_log_sql distribution.

Using load_working_set.php, you select webserver accesslog tables from
within MySQL, set the range of data to analyze (up to 730 days ago to
present), and import the records into the analysis database.

There's currently one tool for analyzing the traffic -
scan_referers.php. This sorts referer hosts by frequency, dropping empty
referers and those that match the virtual host (intrasite referers.) The
hosts can be marked as one of four colors (white, gray, black, or beige)
which correspond to welcome, untrusted, unwelcome, and unlisted. Referer
hosts are grouped in blocks of 25 to keep editing and display managable.

New referer hosts (those not currently in the policy database) are
marked as beige by default. Color changes are stored in the database
along with a timestamp, allowing blacklists that "age out" old data.

Finally, once the referer hosts have been analyzed and assigned colors,
one can generate blacklists in several formats, suitable for use with
several software packages. First, there's the ubiquitous text file,
which can be sorted alphabetically or by second-level domain. Next is
the optimized perl regular expression (see Regexp::List for the
insignificant amount of perl code this requires.) Finally, there's the
rbldnsd zone file, so you can operate your very own private version of
SURBL. rbldnsd can export zone files in BIND format using the -d flag,
so it should be a simple matter of adding BIND-formatted blacklists.

For those brave enough to want to play with it, the first rev of the
code is at:

http://www.austinimprov.com/~apthorpe/code/policy_manager/blacklist.tar.gz

In short, the system is functional but far from being complete; it's
much more than a pipe dream but it needs a lot more work to reach its
full potential as an easy-to-use blacklist management system.

What needs to be done:

- Refactor the code with an eye towards modularity and extensibility
(OOD? The current code is procedural because that's what makes the most
sense to me. It isn't necessarily the best way of doing things.)

- Create more and better analysis tools. Referer frequency is the tip of
the iceberg - see Section 2 of
http://wiki.austinimprov.com/aiwiki/DistributedWebserverDefense_2fBlacklistPolicyImplementation

- More and better policy management tools (escalating host -> domain
listings, IP -> network listings, etc.), better time-stamping, case
logging ("This domain was listed for <this reason> based on <this
evidence>, signed <incident handler>")

- Clean up the UI (CSS)

- Integrate with popular blogs, CMSs, wikis, message boards/forums, etc.
So many tools are based on MySQL, it should be easy to write filters and
queries to import abuse evidence into the policy management system.
Similarly, so many of these community tools can use DNSBLs, text lists,
and regular expressions for blocking, it should be easy for interested
parties to develop output filters to make use of the policy data.

This system should be be easy to install, maintain, and extend. I would
like it released under an OSI-approved license. I'm not trying to
replicate Snort or a bona-fide IDS. I'm trying to detect and deter
normal activity from unwanted sources or of an excessive scale, not
protect against DoS attacks and penetration attempts (contrast a mall
cop with an actual policeman.)

I'm looking for a few developers to help me with this project,
preferably someone comfortable with databases (MySQL & cross-db
portability a plus), people comfortable with PHP and perl (I'm using
Smarty templates because I like a clean separation between analysis and
presentation), and someone who's really interested in reputation/trust
management systems and anonymous secure P2P systems. No, really.

Contact me privately if you're interested or if you want to see the
system demo.

Thanks much,

--
Bob Apthorpe
<ap...@cynistar.net>