You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paul Hurley <pa...@paulhurley.co.uk> on 2006/12/22 23:00:29 UTC

Spam Assassin Logfile analysis

Hello

I can't find much info out there as to what people do in terms of
analysing logfiles etc to tune their SA setup's, so I'll start with what
I'm doing at the  moment.

I'm using the Win32 Pop3 proxy version of Spamassassin V3.1.5.1 (from
(http://sourceforge.net/projects/sawin32/)  It currently doesn't write a
lot of info to the standard logfile, so I'm using grep (I'm using
something called PowerGrep, which works on Windows) to parse several
mail mbox files.  Here's the Regex I'm using:

^From \- ([A-Z][a-z]{2} [A-Z][a-z]{2} [0-9]{2}
[0-9]{2}\:[0-9]{2}\:[0-9]{2} [0-9]{4}).{10,10000}^X\-Spam\-Status\:
(Yes|No)\, score\=((?:\-)?[0-9]{1,4}\.[0-9]{0,2})
required\=[0-9]{1,3}\.[0-9]{1,2} tests\=(.{5,500})autolearn

I then throw the back references into another file so I get the following

Mon Aug 21 18:30:45 2006 No -92.7
AWL,DATE_IN_PAST_12_24,FVGT_u_HAS_2LETTERFLDR,HTML_MESSAGE,HTML_TAG_EXIST_TBODY,L_TITLE_MESSAGE,MK_BAD_HTML_16,MY_NUMPHP,MY_SHRT_IMG,MY_SPACER,NO_RDNS2,RM_rb_ANCHOR,RM_rb_BODY,RM_rb_BREAK,RM_rb_DIV,RM_rb_FONT,RM_rb_HTML,RM_rb_TITLE,USER_IN_WHITELIST,cust_LOCAL_TO_RCVD 


Mon Aug 21 18:39:32 2006 No 2.8
AWL,FCS_URI_NODOTS,FVGT_u_HAS_2LETTERFLDR,HTML_90_100,HTML_EVENT_UNSAFE,HTML_FONT_INVISIBLE,HTML_MESSAGE,ISO_7BITS,J_CHICKENPOX_54,MY_DSL,MY_SHRT_IMG,MY_SPACER,NO_RDNS2,RM_rb_ANCHOR,RM_rb_BREAK,RM_rb_HTML,RM_rb_TITLE,cust_LOCAL_TO_RCVD 


Mon Aug 21 18:40:23 2006 No -95.4
AWL,HTML_60_70,HTML_MESSAGE,MK_BAD_HTML_16,NO_RDNS2,NO_REAL_NAME,RM_rb_ANCHOR,RM_rb_BODY,RM_rb_BREAK,RM_rb_DIV,RM_rb_FONT,RM_rb_HTML,RM_rb_PARA,RM_rb_TITLE,USER_IN_WHITELIST,cust_LOCAL_RTNPATH_RTNPATH,cust_LOCAL_TO_RCVD 



I can then throw that file into excel and do some cleaing up (like
turning the date string into something excel understands) and then can
do some stats.

So for December out of 2,400 messages I had 1.53% Ham and 98.47% Spam.
Now that's depressing !!

What I can't work out at the moment is to do anything usefull with the
rules that were hit.  I suppose I could create a list of the top ten
rules for Spam and Ham...

If I ever work out a better way to do it, or if someone comes up with a
usefull thing to do with the rules hit, I'll let you know...

Paul.

-- 
Paul Hurley	http://www.paulhurley.co.uk/
The knack of flying is learning how to throw yourself at the ground and 
miss.
    Hitchhikers Guide to the Galaxy


Re: Spam Assassin Logfile analysis

Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, Dec 22, 2006 at 10:00:29PM +0000, Paul Hurley wrote:
> What I can't work out at the moment is to do anything usefull with the
> rules that were hit.  I suppose I could create a list of the top ten
> rules for Spam and Ham...

The idea behind having the log entries like that is that it could be sent
through hit-frequencies (in masses/ dir) and be turned into something useful,
but I don't think there has been any real work into doing things with it.

fyi.

-- 
Randomly Selected Tagline:
"I am not a vegetarian because I love animals; I am a vegetarian because
 I hate plants."                  - A. Whitney Brown