You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matthew Yette <my...@mapolce.com> on 2005/08/03 18:04:52 UTC
RE: generating rule stats from spamd logs
No one has any thoughts on this? It's not a quick fix? :(
--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com
-----Original Message-----
From: Matthew Yette
Sent: Friday, July 29, 2005 8:24 AM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs
I'd be able to code it in myself but I'm not fluent in perl (PHP guy)
and of course, the string parsing functions confuse the hell out of me.
LOL. Thought that there might be a lot of perl coders here who can make
this a snap. [Recipient-domain-based filtering & date range also]
Thanks so much!
--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com
-----Original Message-----
From: Matthew Yette
Sent: Thursday, July 28, 2005 12:07 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs
Is there any way to modify this code to accept another command-line
argument for domain-specific? Meaning, I want to look for all rule hits
for mail destined for domain.com?
--
Matthew Yette
Senior Engineer - NOC/Operations
MA Polce Consulting, Inc.
myette@mapolce.com
315-838-1644 (w)
315-356-0597 (f)
AIM/Yahoo: MAPolceNOC
MSN: noc@mapolce.com
-----Original Message-----
From: Dallas L. Engelken [mailto:dallase@nmgi.com]
Sent: Wednesday, July 27, 2005 1:02 PM
To: users@spamassassin.apache.org
Subject: RE: generating rule stats from spamd logs
My mistake.. It is fixed, hopefully for good.
v0.9 - http://www.rulesemporium.com/programs/sa-stats.txt
TOP SPAM RULES FIRED
------------------------------------------------------------
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM
------------------------------------------------------------
1 UNPARSEABLE_RELAY 25322 7.35 74.72 99.76
99.13
2 URIBL_SBL 22241 6.46 65.63 87.63
0.38
3 URIBL_JP_SURBL 21419 6.22 63.20 84.39
0.28
4 URIBL_BLACK 19436 5.64 57.35 76.57
0.93
5 RAZOR2_CF_RANGE_51_100 17562 5.10 51.82 69.19
1.34
6 RAZOR2_CHECK 17475 5.07 51.57 68.85
1.15
7 SARE_SPEC_ROLEX_REP 16553 4.81 48.84 65.22
0.29
8 SPOOF_COM2OTH 16537 4.80 48.80 65.15
0.05
9 RAZOR2_CF_RANGE_E8_51_100 16329 4.74 48.18 64.33
0.16
10 BAYES_99 15380 4.47 45.38 60.59
0.28
------------------------------------------------------------
TOP HAM RULES FIRED
------------------------------------------------------------
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM
------------------------------------------------------------
1 UNPARSEABLE_RELAY 8433 18.93 24.88 99.76
99.13
2 BAYES_00 7005 15.72 20.67 0.74
82.34
3 AWL 4904 11.01 14.47 26.64
57.65
4 HTML_MESSAGE 3813 8.56 11.25 22.92
44.82
5 NO_REAL_NAME 1453 3.26 4.29 37.79
17.08
6 HTML_80_90 1279 2.87 3.77 10.98
15.03
7 MIME_HTML_ONLY 972 2.18 2.87 6.88
11.43
8 HTML_FONT_BIG 794 1.78 2.34 9.28
9.33
9 BAYES_50 625 1.40 1.84 25.40
7.35
10 HTML_FONT_FACE_BAD 545 1.22 1.61 0.76
6.41
------------------------------------------------------------
________________________________
From: Steve Martin [mailto:steve@planomartins.com]
Sent: Wednesday, July 27, 2005 11:44 AM
To: Andy Jezierski
Cc: Dallas L. Engelken; users@spamassassin.apache.org
Subject: Re: generating rule stats from spamd logs
He only fixed the spam rules section.
The TOP HAM RULES sections still has these two incorrect
computations...
my $perc2=sprintf("%.2f",($HAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_HAM)*100);
Number of times a rule fired on ham / total number of spam
messages.
Number of times a rule fired on spam / total number of ham
messages.
my $perc2=sprintf("%.2f",($SPAM_RULES{$key}/$NUM_SPAM)*100);
my $perc3=sprintf("%.2f",($HAM_RULES{$key}/$NUM_HAM)*100);
On Jul 27, 2005, at 11:32 AM, Andy Jezierski wrote:
"Dallas L. Engelken" <da...@nmgi.com> wrote on
07/27/2005 11:26:54 AM:
> > -----Original Message-----
> > From: Chris Thielen
[mailto:cmt-spamassassin@someone.dhs.org]
> > Sent: Wednesday, July 27, 2005 11:02 AM
> > To: Dallas L. Engelken
> > Cc: users@spamassassin.apache.org
> > Subject: Re: generating rule stats from spamd logs
> >
> > Dallas L. Engelken wrote:
> >
> > >BAYES_00 hits 15.27 of spam on yours, the %ofspam
on top ham
> > rules and
> > >%ofham on top spam rules must be buggy.
> > >
> > >i'm not running that version with the 5th column.
It must be buggy.
> > >i play with it after bit.
> > >
> > >Dallas
> > >
> > >
> >
> > Dallas,
> >
> > Did you see the patch I sent to the SARE list? Just
need to
> > swap two hash lookups.
> >
> >
>
> Yup yup.
http://www.rulesemporium.com/programs/sa-stats.txt updated.
>
> D
Something's still a little fishy. SA 3.1 latest SVN, if
it makes any difference.
python# ./sa-stats -f maillog.0 -n 5
Email: 6111 Autolearn: 226 AvgScore: 2.15
AvgScanTime: 3.91 sec
Spam: 655 Autolearn: 133 AvgScore: 14.81
AvgScanTime: 3.76 sec
Ham: 5456 Autolearn: 93 AvgScore: 0.63
AvgScanTime: 3.93 sec
Time Spent Running SA: 6.64 hours
Time Spent Processing Spam: 0.68 hours
Time Spent Processing Ham: 5.96 hours
TOP SPAM RULES FIRED
------------------------------------------------------------
RANK RULE NAME COUNT %OFRULES
%OFMAIL %OFSPAM %OFHAM
------------------------------------------------------------
1 HTML_MESSAGE 496 5.67
8.12 75.73 62.19
2 DCC_CHECK 310 3.55
5.07 47.33 7.02
3 BAYES_99 305 3.49
4.99 46.56 0.02
4 RAZOR2_CHECK 277 3.17
4.53 42.29 4.23
5 DIGEST_MULTIPLE 251 2.87
4.11 38.32 2.42
------------------------------------------------------------
TOP HAM RULES FIRED
------------------------------------------------------------
RANK RULE NAME COUNT %OFRULES
%OFMAIL %OFSPAM %OFHAM
------------------------------------------------------------
1 BAYES_00 4079 14.05
66.75 622.75 1.83
2 HTML_MESSAGE 3393 11.68
55.52 518.02 9.09
3 NO_REAL_NAME 1053 3.63
17.23 160.76 1.06
4 HTML_80_90 931 3.21
15.23 142.14 2.35
5 LG_4C_2V_3C 798 2.75
13.06 121.83 2.20
------------------------------------------------------------
--
Steve Martin
http://www.cheezmo.com/
Smart Calibration, LLC
http://www.smartcalibration.com/
The Widescreen Movie Center
http://www.widemovies.com/
Letterboxed Movie TV Schedule http://www.widemovies.com/lbx.html