You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matthew Yette <my...@mapolce.com> on 2005/08/03 18:04:52 UTC

RE: generating rule stats from spamd logs

No one has any thoughts on this? It's not a quick fix? :(
--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com

-----Original Message-----
From: Matthew Yette 
Sent: Friday, July 29, 2005 8:24 AM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


I'd be able to code it in myself but I'm not fluent in perl (PHP guy)
and of course, the string parsing functions confuse the hell out of me.
LOL. Thought that there might be a lot of perl coders here who can make
this a snap. [Recipient-domain-based filtering & date range also]

Thanks so much!

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com

-----Original Message-----
From: Matthew Yette 
Sent: Thursday, July 28, 2005 12:07 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


Is there any way to modify this code to accept another command-line
argument for domain-specific? Meaning, I want to look for all rule hits
for mail destined for domain.com?

--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com

-----Original Message-----
From: Dallas L. Engelken [mailto:dallase@nmgi.com] 
Sent: Wednesday, July 27, 2005 1:02 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs


My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt


TOP SPAM RULES FIRED
------------------------------------------------------------
RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM
------------------------------------------------------------
   1    UNPARSEABLE_RELAY               25322     7.35   74.72   99.76
99.13
   2    URIBL_SBL                       22241     6.46   65.63   87.63
0.38
   3    URIBL_JP_SURBL                  21419     6.22   63.20   84.39
0.28
   4    URIBL_BLACK                     19436     5.64   57.35   76.57
0.93
   5    RAZOR2_CF_RANGE_51_100          17562     5.10   51.82   69.19
1.34
   6    RAZOR2_CHECK                    17475     5.07   51.57   68.85
1.15
   7    SARE_SPEC_ROLEX_REP             16553     4.81   48.84   65.22
0.29
   8    SPOOF_COM2OTH                   16537     4.80   48.80   65.15
0.05
   9    RAZOR2_CF_RANGE_E8_51_100       16329     4.74   48.18   64.33
0.16
  10    BAYES_99                        15380     4.47   45.38   60.59
0.28
------------------------------------------------------------
 
TOP HAM RULES FIRED
------------------------------------------------------------
RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM
------------------------------------------------------------
   1    UNPARSEABLE_RELAY                8433    18.93   24.88   99.76
99.13
   2    BAYES_00                         7005    15.72   20.67    0.74
82.34
   3    AWL                              4904    11.01   14.47   26.64
57.65
   4    HTML_MESSAGE                     3813     8.56   11.25   22.92
44.82
   5    NO_REAL_NAME                     1453     3.26    4.29   37.79
17.08
   6    HTML_80_90                       1279     2.87    3.77   10.98
15.03
   7    MIME_HTML_ONLY                    972     2.18    2.87    6.88
11.43
   8    HTML_FONT_BIG                     794     1.78    2.34    9.28
9.33
   9    BAYES_50                          625     1.40    1.84   25.40
7.35
  10    HTML_FONT_FACE_BAD                545     1.22    1.61    0.76
6.41
------------------------------------------------------------

 


________________________________

	From: Steve Martin [mailto:steve@planomartins.com] 
	Sent: Wednesday, July 27, 2005 11:44 AM
	To: Andy Jezierski
	Cc: Dallas L. Engelken; users@spamassassin.apache.org
	Subject: Re: generating rule stats from spamd logs
	
	
	He only fixed the spam rules section. 

	The TOP HAM RULES sections still has these two incorrect
computations...

	    my $perc2=sprintf("%.2f",($HAM_RULES{$key}/$NUM_SPAM)*100);
	    my $perc3=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_HAM)*100);


	Number of times a rule fired on ham / total number of spam
messages.
	Number of times a rule fired on spam / total number of ham
messages.

	    my $perc2=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_SPAM)*100);
	    my $perc3=sprintf("%.2f",($HAM_RULES{$key}/$NUM_HAM)*100);

	On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:



		"Dallas L. Engelken" <da...@nmgi.com> wrote on
07/27/2005 11:26:54 AM:
		
		>  > -----Original Message-----
		> > From: Chris Thielen
[mailto:cmt-spamassassin@someone.dhs.org] 
		> > Sent: Wednesday, July 27, 2005 11:02 AM
		> > To: Dallas L. Engelken
		> > Cc: users@spamassassin.apache.org
		> > Subject: Re: generating rule stats from spamd logs
		> > 
		> > Dallas L. Engelken wrote:
		> > 
		> > >BAYES_00 hits 15.27 of spam on yours, the %ofspam
on top ham 
		> > rules and 
		> > >%ofham on top spam rules must be buggy.
		> > >
		> > >i'm not running that version with the 5th column.
It must be buggy.
		> > >i play with it after bit. 
		> > > 
		> > >Dallas
		> > >  
		> > >
		> > 
		> > Dallas,
		> > 
		> > Did you see the patch I sent to the SARE list?  Just
need to 
		> > swap two hash lookups.
		> > 
		> > 
		> 
		> Yup yup.
http://www.rulesemporium.com/programs/sa-stats.txt updated.
		> 
		> D
		
		
		Something's still a little fishy.  SA 3.1 latest SVN, if
it makes any difference. 
		
		
		
		python# ./sa-stats -f maillog.0 -n 5 
		Email:     6111  Autolearn:   226  AvgScore:   2.15
AvgScanTime:  3.91 sec 
		Spam:       655  Autolearn:   133  AvgScore:  14.81
AvgScanTime:  3.76 sec 
		Ham:       5456  Autolearn:    93  AvgScore:   0.63
AvgScanTime:  3.93 sec 
		
		Time Spent Running SA:         6.64 hours 
		Time Spent Processing Spam:    0.68 hours 
		Time Spent Processing Ham:     5.96 hours 
		
		TOP SPAM RULES FIRED 
	
------------------------------------------------------------ 
		RANK    RULE NAME                       COUNT %OFRULES
%OFMAIL %OFSPAM  %OFHAM 
	
------------------------------------------------------------ 
		   1    HTML_MESSAGE                      496     5.67
8.12   75.73   62.19 
		   2    DCC_CHECK                         310     3.55
5.07   47.33    7.02 
		   3    BAYES_99                          305     3.49
4.99   46.56    0.02 
		   4    RAZOR2_CHECK                      277     3.17
4.53   42.29    4.23 
		   5    DIGEST_MULTIPLE                   251     2.87
4.11   38.32    2.42 
	
------------------------------------------------------------ 
		
		TOP HAM RULES FIRED 
	
------------------------------------------------------------ 
		RANK    RULE NAME                       COUNT %OFRULES
%OFMAIL %OFSPAM  %OFHAM 
	
------------------------------------------------------------ 
		   1    BAYES_00                         4079    14.05
66.75  622.75    1.83 
		   2    HTML_MESSAGE                     3393    11.68
55.52  518.02    9.09 
		   3    NO_REAL_NAME                     1053     3.63
17.23  160.76    1.06 
		   4    HTML_80_90                        931     3.21
15.23  142.14    2.35 
		   5    LG_4C_2V_3C                       798     2.75
13.06  121.83    2.20 
	
------------------------------------------------------------ 
		
		 


	--

	Steve Martin
http://www.cheezmo.com/

	Smart Calibration, LLC
http://www.smartcalibration.com/

	The Widescreen Movie Center
http://www.widemovies.com/

	Letterboxed Movie TV Schedule http://www.widemovies.com/lbx.html