You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Skip <sk...@pelorus.org> on 2008/08/03 03:25:51 UTC

Giving Back--A stats script I wrote

This may be kinda simple for you gurus out there, in which case I 
welcome your feedback and suggestions to make this better.  But if 
anyone finds this useful...great!

I wanted a stats tool that would tell me what rules were hit on the 
most.  Which ones ONLY trigger on spam and which ones ONLY trigger on 
HAM?  I wanted to know what percentage of my HAM was whitelisted.  Do I 
have my rule scores set high or low enough and do I have the required 
score for the SPAM threshold at the right place?  I wanted something 
that was flexible and powerful.  So I thought about ways to get my 
spamassassin data into mysql.  Look at this screenshot and you'll get 
the idea:

http://pelorus.org/pictures/mailstats.gif

Obviously, with that type of granularity, I could generate any kind of 
report I wanted. 

The way I do it is I generate a few custom headers in procmail to make 
things easier, and I have a couple of special SA headers added, again, 
to make things easier.  Then I pipe a carbon copy of each email through 
this bash script which parses it and puts all the data into mysql.  I 
just finished it today, so I don't have any pretty charts or anything 
yet, but I do think it will meet my needs.

I did look at some of the other data collection utilities out there, but 
I didn't see any that were quite this flexible, if I do say so myself.  
Perhaps I am mistaken and there is one (or more) that can do what this 
does and more.

Here's the script, along with many (helpful, I hope) comments.
http://pastebin.com/f743e7daa

Like I said, if any of you smart guys out there see ways to improve 
this, I sure would appreciate the feedback.

Thanks.

Skip

-- 
Get my PGP Public key here:
http://pelorus.org/skip@pelorus.org_public_key.asc