You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Charles Sprickman <sp...@bway.net> on 2016/02/29 06:24:50 UTC
Missed spam, suggestions?
Hi all,
Recently I occasionally get bursts of spam that slips through Postfix (postscreen BL checks, protocol checks) and SpamAssassin. I just had another big jump in the last week. This was mostly spam touting Oil Changes, SUV sales and Lawyer Finders.
What I just did was go through a collection of missed spam and re-ran it through spamassassin. All of it jumped from originally scoring around 2-3 to a minimum of 6.5 with most hitting around 12. The biggest difference I see is that DNSBL and URIBL services had started hitting. When originally received, these emails all originated from very clean IPs.
I have TXREP enabled as well, but that doesn’t seem to be having either a positive or negative impact.
What are my options to try to catch this junk before it hits the various *BLs?
I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates. I’m considering enabling it again with a shared db for all users, which I hope might work better. It would only be auto trained, perhaps with some manual training by me.
Here’s a few samples, hosted elsewhere so as not to trip anyone’s filters:
https://gist.github.com/anonymous/0fcaf481875959c9151f (2.7 on Friday, 14 tonight)
https://gist.github.com/anonymous/a5396f68699392808988 (3.4 earlier tonight, 6.5 just now)
I have more samples, I can dig them up if that’s helpful.
Sometimes I wonder how much this has to do with the age of our domain and the fact that it begins with “b”. :)
The only thing I’ve been contemplating is a local spamtrap and DNSBL. We have a site that’s regularly trawled for email addresses, so seeding it should not be too difficult…
Charles
Re: Missed spam, suggestions?
Posted by John Hardin <jh...@impsec.org>.
On Mon, 29 Feb 2016, Charles Sprickman wrote:
> My concern with disabling autolearn is that then I’m the only one
> training. My spam probably looks like everyone else’s, but my ham is
> very different, lots list traffic and such.
You can still have your users provide misses for training, you'd just need
to vet the messages before feeding them to sa_learn (unless you really
trust a given user's judgement and honesty - the big problem is users
training messages from lists they actually did subscribe to as spam,
rather than unsubscribing).
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
We should endeavour to teach our children to be gun-proof
rather than trying to design our guns to be child-proof
-----------------------------------------------------------------------
13 days until Albert Einstein's 137th Birthday
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
Sorry - I missed the post from dbfunk. I just saw it in the archive. sa-stats.pl is the program,
and you have to feed it from spamd.log to get those stats.
To get a spamd.log, you have to start spamd with this
-s facility, --syslog=facility <>
Specify the syslog facility to use (default: mail). If stderr is specified, output will be written to stderr. (This is useful if you're running spamd under the daemontools package.) With a facility of file, all output goes to spamd.log. facility is interpreted as a file name to log to if it contains any characters except a-z and 0-9. null disables logging completely (used internally).
spamd -s /var/log/spamd.log # log to file /var/log/spamd.log
> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>
> I would like to know how to get these stats too.
>
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>
> Can I ask, how are you getting these stats please?
>
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
Thanks, yes, confusion had set in there … now I’m on the right track
It will however be handy to have both.
Robert
> On 11 Mar 2016, at 14:59, Dave Funk <db...@engineering.uiowa.edu> wrote:
>
> TL;DR
> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
>
> This is confusing because there are two different programs named "sa-stats.pl".
>
> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
>
> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> be sure to search that page for reference to Dallas Engelken.
>
>
>
> On Fri, 11 Mar 2016, Robert Chalmers wrote:
>
>> The sa-stats.pl I refer to is here.
>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
>> that is.
>> and has an output like this.
>> zeus:~ robert$ perl sa-stats.pl
>> Report Title : SpamAssassin - Spam Statistics
>> Report Date : 2016-03-11
>> Period Beginning : Fri 11 Mar 00:00:00 2016
>> Period Ending : Sat 12 Mar 00:00:00 2016
>> Reporting Period : 24.00 hrs
>> --------------------------------------------------
>> Note: 'ham' = 'nonspam'
>> Total spam detected : 22 ( 51.16%)
>> Total ham accepted : 21 ( 48.84%)
>> -------------------
>> Total emails processed : 43 ( 2/hr)
>> Average spam threshold : 3.00
>> Average spam score : 4.46
>> Average ham score : -2.10
>> Spam kbytes processed : 397 ( 17 kb/hr)
>> Ham kbytes processed : 147 ( 6 kb/hr)
>> Total kbytes processed : 545 ( 23 kb/hr)
>> Spam analysis time : 339 s ( 14 s/hr)
>> Ham analysis time : 366 s ( 15 s/hr)
>> Total analysis time : 706 s ( 29 s/hr)
>> Statistics by Hour
>> ----------------------------------------------------
>> Hour Spam Ham
>> ------------- ----------------- --------------
>> 2016-03-11 00 0 ( 0%) 13 (100%)
>> 2016-03-11 01 0 ( 0%) 0 ( 0%)
>> 2016-03-11 02 2 (100%) 0 ( 0%)
>> 2016-03-11 03 4 (100%) 0 ( 0%)
>> 2016-03-11 04 4 ( 57%) 3 ( 42%)
>> 2016-03-11 05 6 ( 75%) 2 ( 25%)
>> 2016-03-11 06 6 (100%) 0 ( 0%)
>> 2016-03-11 07 0 ( 0%) 3 (100%)
>> 2016-03-11 08 0 ( 0%) 0 ( 0%)
>> 2016-03-11 09 0 ( 0%) 0 ( 0%)
>> 2016-03-11 10 0 ( 0%) 0 ( 0%)
>> 2016-03-11 11 0 ( 0%) 0 ( 0%)
>> 2016-03-11 12 0 ( 0%) 0 ( 0%)
>> 2016-03-11 13 0 ( 0%) 0 ( 0%)
>> 2016-03-11 14 0 ( 0%) 0 ( 0%)
>> 2016-03-11 15 0 ( 0%) 0 ( 0%)
>> 2016-03-11 16 0 ( 0%) 0 ( 0%)
>> 2016-03-11 17 0 ( 0%) 0 ( 0%)
>> 2016-03-11 18 0 ( 0%) 0 ( 0%)
>> 2016-03-11 19 0 ( 0%) 0 ( 0%)
>> 2016-03-11 20 0 ( 0%) 0 ( 0%)
>> 2016-03-11 21 0 ( 0%) 0 ( 0%)
>> 2016-03-11 22 0 ( 0%) 0 ( 0%)
>> 2016-03-11 23 0 ( 0%) 0 ( 0%)
>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>>
>> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>> I would like to know how to get these stats too.
>> From: Robert Chalmers [mailto:robert@chalmers.com.au] Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>> Can I ask, how are you getting these stats please?
>> Thanks
>> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>>
>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve
>> seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>>
>> Here’s some rule stats for anyone interested:
>>
>> TOP SPAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 TXREP 13171 8.47 40.38 91.00 72.91
>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
>> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
>> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
>> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
>> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
>> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
>> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
>> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>>
>> TOP HAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
>> 3 TXREP 13228 7.33 40.56 91.00 72.91
>> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
>> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
>> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
>> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
>> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
>> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
>> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>> stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
>> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
>> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
>> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
>> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
>> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
>> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
>> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
>> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
>> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
>> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
>> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
>> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
>> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
>> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
>> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
>> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
>> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
>> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>> --
>> Dave Funk University of Iowa
>> <dbfunk (at) engineering.uiowa.edu> College of Engineering
>> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>>
>> Robert Chalmers
>> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> Robert Chalmers
>> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Mar 2016, Robert Chalmers wrote:
> Found a copy here …
> http://www.impsec.org/~jhardin/antispam/sa-stats.pl
Note that I also host a version that works with gzipped log files, if you
have compression enabled in your log rotator.
But that's not the latest. I don't know where the v1.03 David has came
from. David, if you'd care to email me your copy, I'll see about updating
the one I host.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
If you ask amateurs to act as front-line security personnel,
you shouldn't be surprised when you get amateur security.
-- Bruce Schneier
-----------------------------------------------------------------------
84 days since the first successful real return to launch site (SpaceX)
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
Found a copy here …
http://www.impsec.org/~jhardin/antispam/sa-stats.pl
So finally found the right one. It does seem to be all working ok - at least to my eye.
./Sa_Stats.pl --logdir /var/log --filename spamd.log --num 18
Email: 53 Autolearn: 14 AvgScore: 1.02 AvgScanTime: 6.14 sec
Spam: 20 Autolearn: 0 AvgScore: 4.15 AvgScanTime: 5.29 sec
Ham: 33 Autolearn: 14 AvgScore: -0.88 AvgScanTime: 6.65 sec
Time Spent Running SA: 0.09 hours
Time Spent Processing Spam: 0.03 hours
Time Spent Processing Ham: 0.06 hours
TOP SPAM RULES FIRED
------------------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM AVGSCO
------------------------------------------------------------------------------
1 HTML_MESSAGE 20 52.83 100.00 24.24 4.15
2 SPF_PASS 17 43.40 85.00 18.18 3.76
3 DCC_CHECK 15 39.62 75.00 18.18 4.33
4 BAYES_50 14 26.42 70.00 0.00 3.86
5 RDNS_NONE 13 24.53 65.00 0.00 4.15
6 SPF_HELO_PASS 13 24.53 65.00 0.00 4.00
7 T_REMOTE_IMAGE 8 15.09 40.00 0.00 3.75
8 DKIM_SIGNED 6 45.28 30.00 54.55 3.17
9 BAYES_999 6 11.32 30.00 0.00 4.83
10 BAYES_99 6 11.32 30.00 0.00 4.83
11 DKIM_VALID 6 45.28 30.00 54.55 3.17
12 RP_MATCHES_RCVD 4 30.19 20.00 36.36 3.25
13 DKIM_VALID_AU 4 37.74 20.00 48.48 3.00
14 HTML_IMAGE_RATIO_02 3 5.66 15.00 0.00 3.67
15 MPART_ALT_DIFF_COUNT 3 5.66 15.00 0.00 6.67
16 MPART_ALT_DIFF 2 3.77 10.00 0.00 6.50
17 FROM_12LTRDOM 2 3.77 10.00 0.00 3.00
18 MORE_SEX 2 3.77 10.00 0.00 5.00
------------------------------------------------------------------------------
TOP HAM RULES FIRED
------------------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM AVGSCO
------------------------------------------------------------------------------
1 BAYES_00 32 60.38 0.00 96.97 -0.91
2 HEADER_FROM_DIFFERENT_DOMAINS 29 56.60 5.00 87.88 -0.83
3 DKIM_VALID 18 45.28 30.00 54.55 -0.78
4 DKIM_SIGNED 18 45.28 30.00 54.55 -0.78
5 DKIM_VALID_AU 16 37.74 20.00 48.48 -0.88
6 RP_MATCHES_RCVD 12 30.19 20.00 36.36 -1.08
7 HTML_MESSAGE 8 52.83 100.00 24.24 -0.88
8 DCC_CHECK 6 39.62 75.00 18.18 0.17
9 FREEMAIL_FORGED_FROMDOMAIN 6 11.32 0.00 18.18 -1.17
10 SPF_PASS 6 43.40 85.00 18.18 -1.17
11 FREEMAIL_FROM 6 13.21 5.00 18.18 -1.17
12 UNPARSEABLE_RELAY 3 5.66 0.00 9.09 -1.00
13 DEAR_SOMETHING 2 3.77 0.00 6.06 0.50
14 MSGID_FROM_MTA_HEADER 1 1.89 0.00 3.03 -1.00
15 HTML_FONT_LOW_CONTRAST 1 5.66 10.00 3.03 0.00
16 DKIM_ADSP_CUSTOM_MED 1 1.89 0.00 3.03 -1.00
17 BAYES_05 1 1.89 0.00 3.03 0.00
18 ALL_TRUSTED 1 1.89 0.00 3.03 -2.00
------------------------------------------------------------------------------
> On 11 Mar 2016, at 15:33, Robert Chalmers <ro...@chalmers.com.au> wrote:
>
>
> Just a note - that server address isn’t responding at the moment. Maybe later.Hopefully only temporary.
>
>
>> On 11 Mar 2016, at 14:59, Dave Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>>
>> TL;DR
>> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
>>
>> This is confusing because there are two different programs named "sa-stats.pl".
>>
>> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
>>
>> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
>> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers <http://wiki.apache.org/spamassassin/StatsAndAnalyzers>
>> be sure to search that page for reference to Dallas Engelken.
>>
>>
>>
>> On Fri, 11 Mar 2016, Robert Chalmers wrote:
>>
>>> The sa-stats.pl I refer to is here.
>>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl <https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl>. It’s not the same as the ones shown in other posts. I don’t know what
>>> that is.
>>> and has an output like this.
>>> zeus:~ robert$ perl sa-stats.pl
>>> Report Title : SpamAssassin - Spam Statistics
>>> Report Date : 2016-03-11
>>> Period Beginning : Fri 11 Mar 00:00:00 2016
>>> Period Ending : Sat 12 Mar 00:00:00 2016
>>> Reporting Period : 24.00 hrs
>>> --------------------------------------------------
>>> Note: 'ham' = 'nonspam'
>>> Total spam detected : 22 ( 51.16%)
>>> Total ham accepted : 21 ( 48.84%)
>>> -------------------
>>> Total emails processed : 43 ( 2/hr)
>>> Average spam threshold : 3.00
>>> Average spam score : 4.46
>>> Average ham score : -2.10
>>> Spam kbytes processed : 397 ( 17 kb/hr)
>>> Ham kbytes processed : 147 ( 6 kb/hr)
>>> Total kbytes processed : 545 ( 23 kb/hr)
>>> Spam analysis time : 339 s ( 14 s/hr)
>>> Ham analysis time : 366 s ( 15 s/hr)
>>> Total analysis time : 706 s ( 29 s/hr)
>>> Statistics by Hour
>>> ----------------------------------------------------
>>> Hour Spam Ham
>>> ------------- ----------------- --------------
>>> 2016-03-11 00 0 ( 0%) 13 (100%)
>>> 2016-03-11 01 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 02 2 (100%) 0 ( 0%)
>>> 2016-03-11 03 4 (100%) 0 ( 0%)
>>> 2016-03-11 04 4 ( 57%) 3 ( 42%)
>>> 2016-03-11 05 6 ( 75%) 2 ( 25%)
>>> 2016-03-11 06 6 (100%) 0 ( 0%)
>>> 2016-03-11 07 0 ( 0%) 3 (100%)
>>> 2016-03-11 08 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 09 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 10 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 11 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 12 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 13 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 14 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 15 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 16 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 17 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 18 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 19 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 20 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 21 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 22 0 ( 0%) 0 ( 0%)
>>> 2016-03-11 23 0 ( 0%) 0 ( 0%)
>>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>>>
>>> On 10 Mar 2016, at 21:38, Erickarlo Porro <eporro@earthcam.com <ma...@earthcam.com>> wrote:
>>> I would like to know how to get these stats too.
>>> From: Robert Chalmers [mailto:robert@chalmers.com.au <ma...@chalmers.com.au>] Sent: Tuesday, March 08, 2016 5:25 AM
>>> To: users@spamassassin.apache.org <ma...@spamassassin.apache.org>
>>> Subject: Re: Missed spam, suggestions?
>>> Can I ask, how are you getting these stats please?
>>> Thanks
>>> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>>>
>>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve
>>> seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>>>
>>> Here’s some rule stats for anyone interested:
>>>
>>> TOP SPAM RULES FIRED
>>>
>>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>>
>>> 1 TXREP 13171 8.47 40.38 91.00 72.91
>>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>>> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
>>> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
>>> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
>>> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
>>> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
>>> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
>>> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
>>> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>>>
>>> TOP HAM RULES FIRED
>>>
>>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>>
>>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>>> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
>>> 3 TXREP 13228 7.33 40.56 91.00 72.91
>>> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
>>> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
>>> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
>>> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
>>> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
>>> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
>>> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>>> stats and BAYES_50 shouldn't be in the top-10 at all.
>>> (of course if you've only been training for a week that would explain it).
>>> For example, here's my top-10 hits (for a one month interval).
>>> TOP SPAM RULES FIRED
>>> ----------------------------------------------------------------------
>>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>>> ----------------------------------------------------------------------
>>> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
>>> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
>>> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
>>> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
>>> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
>>> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
>>> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
>>> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
>>> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
>>> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>>> OP HAM RULES FIRED
>>> ----------------------------------------------------------------------
>>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>>> ----------------------------------------------------------------------
>>> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
>>> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
>>> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
>>> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
>>> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
>>> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
>>> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
>>> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
>>> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
>>> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>>> --
>>> Dave Funk University of Iowa
>>> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>> College of Engineering
>>> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
>>> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
>>> #include <std_disclaimer.h>
>>> Better is not better, 'standard' is better. B{
>>>
>>> Robert Chalmers
>>> robert@chalmers.com.au <ma...@chalmers.com.au> Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
>>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>> Robert Chalmers
>>> robert@chalmers.com.au <ma...@chalmers.com.au> Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
>>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>>
>>
>> --
>> Dave Funk University of Iowa
>> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>> College of Engineering
>> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>
>
>
>
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
Just a note - that server address isn’t responding at the moment. Maybe later.Hopefully only temporary.
> On 11 Mar 2016, at 14:59, Dave Funk <db...@engineering.uiowa.edu> wrote:
>
> TL;DR
> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
>
> This is confusing because there are two different programs named "sa-stats.pl".
>
> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
>
> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> be sure to search that page for reference to Dallas Engelken.
>
>
>
> On Fri, 11 Mar 2016, Robert Chalmers wrote:
>
>> The sa-stats.pl I refer to is here.
>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
>> that is.
>> and has an output like this.
>> zeus:~ robert$ perl sa-stats.pl
>> Report Title : SpamAssassin - Spam Statistics
>> Report Date : 2016-03-11
>> Period Beginning : Fri 11 Mar 00:00:00 2016
>> Period Ending : Sat 12 Mar 00:00:00 2016
>> Reporting Period : 24.00 hrs
>> --------------------------------------------------
>> Note: 'ham' = 'nonspam'
>> Total spam detected : 22 ( 51.16%)
>> Total ham accepted : 21 ( 48.84%)
>> -------------------
>> Total emails processed : 43 ( 2/hr)
>> Average spam threshold : 3.00
>> Average spam score : 4.46
>> Average ham score : -2.10
>> Spam kbytes processed : 397 ( 17 kb/hr)
>> Ham kbytes processed : 147 ( 6 kb/hr)
>> Total kbytes processed : 545 ( 23 kb/hr)
>> Spam analysis time : 339 s ( 14 s/hr)
>> Ham analysis time : 366 s ( 15 s/hr)
>> Total analysis time : 706 s ( 29 s/hr)
>> Statistics by Hour
>> ----------------------------------------------------
>> Hour Spam Ham
>> ------------- ----------------- --------------
>> 2016-03-11 00 0 ( 0%) 13 (100%)
>> 2016-03-11 01 0 ( 0%) 0 ( 0%)
>> 2016-03-11 02 2 (100%) 0 ( 0%)
>> 2016-03-11 03 4 (100%) 0 ( 0%)
>> 2016-03-11 04 4 ( 57%) 3 ( 42%)
>> 2016-03-11 05 6 ( 75%) 2 ( 25%)
>> 2016-03-11 06 6 (100%) 0 ( 0%)
>> 2016-03-11 07 0 ( 0%) 3 (100%)
>> 2016-03-11 08 0 ( 0%) 0 ( 0%)
>> 2016-03-11 09 0 ( 0%) 0 ( 0%)
>> 2016-03-11 10 0 ( 0%) 0 ( 0%)
>> 2016-03-11 11 0 ( 0%) 0 ( 0%)
>> 2016-03-11 12 0 ( 0%) 0 ( 0%)
>> 2016-03-11 13 0 ( 0%) 0 ( 0%)
>> 2016-03-11 14 0 ( 0%) 0 ( 0%)
>> 2016-03-11 15 0 ( 0%) 0 ( 0%)
>> 2016-03-11 16 0 ( 0%) 0 ( 0%)
>> 2016-03-11 17 0 ( 0%) 0 ( 0%)
>> 2016-03-11 18 0 ( 0%) 0 ( 0%)
>> 2016-03-11 19 0 ( 0%) 0 ( 0%)
>> 2016-03-11 20 0 ( 0%) 0 ( 0%)
>> 2016-03-11 21 0 ( 0%) 0 ( 0%)
>> 2016-03-11 22 0 ( 0%) 0 ( 0%)
>> 2016-03-11 23 0 ( 0%) 0 ( 0%)
>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>>
>> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>> I would like to know how to get these stats too.
>> From: Robert Chalmers [mailto:robert@chalmers.com.au] Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>> Can I ask, how are you getting these stats please?
>> Thanks
>> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>>
>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve
>> seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>>
>> Here’s some rule stats for anyone interested:
>>
>> TOP SPAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 TXREP 13171 8.47 40.38 91.00 72.91
>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
>> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
>> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
>> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
>> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
>> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
>> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
>> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>>
>> TOP HAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
>> 3 TXREP 13228 7.33 40.56 91.00 72.91
>> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
>> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
>> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
>> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
>> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
>> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
>> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>> stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
>> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
>> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
>> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
>> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
>> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
>> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
>> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
>> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
>> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
>> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
>> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
>> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
>> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
>> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
>> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
>> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
>> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
>> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>> --
>> Dave Funk University of Iowa
>> <dbfunk (at) engineering.uiowa.edu> College of Engineering
>> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>>
>> Robert Chalmers
>> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> Robert Chalmers
>> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by Dave Funk <db...@engineering.uiowa.edu>.
TL;DR
You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
This is confusing because there are two different programs named
"sa-stats.pl".
The one that comes with SpamAssassin (what you're referring to) is an
engine stats reporting tool; does not do rule hits analysis.
The tool that Charles Sprickman and I used is the one from Dallas
Engelken.
See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
be sure to search that page for reference to Dallas Engelken.
On Fri, 11 Mar 2016, Robert Chalmers wrote:
> The sa-stats.pl I refer to is here.
> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
> that is.
>
> and has an output like this.
>
> zeus:~ robert$ perl sa-stats.pl
> Report Title : SpamAssassin - Spam Statistics
> Report Date : 2016-03-11
> Period Beginning : Fri 11 Mar 00:00:00 2016
> Period Ending : Sat 12 Mar 00:00:00 2016
>
> Reporting Period : 24.00 hrs
> --------------------------------------------------
>
> Note: 'ham' = 'nonspam'
>
> Total spam detected : 22 ( 51.16%)
> Total ham accepted : 21 ( 48.84%)
> -------------------
> Total emails processed : 43 ( 2/hr)
>
> Average spam threshold : 3.00
> Average spam score : 4.46
> Average ham score : -2.10
>
> Spam kbytes processed : 397 ( 17 kb/hr)
> Ham kbytes processed : 147 ( 6 kb/hr)
> Total kbytes processed : 545 ( 23 kb/hr)
>
> Spam analysis time : 339 s ( 14 s/hr)
> Ham analysis time : 366 s ( 15 s/hr)
> Total analysis time : 706 s ( 29 s/hr)
>
>
> Statistics by Hour
> ----------------------------------------------------
> Hour Spam Ham
> ------------- ----------------- --------------
> 2016-03-11 00 0 ( 0%) 13 (100%)
> 2016-03-11 01 0 ( 0%) 0 ( 0%)
> 2016-03-11 02 2 (100%) 0 ( 0%)
> 2016-03-11 03 4 (100%) 0 ( 0%)
> 2016-03-11 04 4 ( 57%) 3 ( 42%)
> 2016-03-11 05 6 ( 75%) 2 ( 25%)
> 2016-03-11 06 6 (100%) 0 ( 0%)
> 2016-03-11 07 0 ( 0%) 3 (100%)
> 2016-03-11 08 0 ( 0%) 0 ( 0%)
> 2016-03-11 09 0 ( 0%) 0 ( 0%)
> 2016-03-11 10 0 ( 0%) 0 ( 0%)
> 2016-03-11 11 0 ( 0%) 0 ( 0%)
> 2016-03-11 12 0 ( 0%) 0 ( 0%)
> 2016-03-11 13 0 ( 0%) 0 ( 0%)
> 2016-03-11 14 0 ( 0%) 0 ( 0%)
> 2016-03-11 15 0 ( 0%) 0 ( 0%)
> 2016-03-11 16 0 ( 0%) 0 ( 0%)
> 2016-03-11 17 0 ( 0%) 0 ( 0%)
> 2016-03-11 18 0 ( 0%) 0 ( 0%)
> 2016-03-11 19 0 ( 0%) 0 ( 0%)
> 2016-03-11 20 0 ( 0%) 0 ( 0%)
> 2016-03-11 21 0 ( 0%) 0 ( 0%)
> 2016-03-11 22 0 ( 0%) 0 ( 0%)
> 2016-03-11 23 0 ( 0%) 0 ( 0%)
>
>
> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>
> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>
> I would like to know how to get these stats too.
>
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>
> Can I ask, how are you getting these stats please?
>
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve
> seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
> stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
>
> Robert Chalmers
> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>
>
> Robert Chalmers
> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>
>
>
>
>
>
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
The sa-stats.pl I refer to is here.
https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what that is.
and has an output like this.
zeus:~ robert$ perl sa-stats.pl
Report Title : SpamAssassin - Spam Statistics
Report Date : 2016-03-11
Period Beginning : Fri 11 Mar 00:00:00 2016
Period Ending : Sat 12 Mar 00:00:00 2016
Reporting Period : 24.00 hrs
--------------------------------------------------
Note: 'ham' = 'nonspam'
Total spam detected : 22 ( 51.16%)
Total ham accepted : 21 ( 48.84%)
-------------------
Total emails processed : 43 ( 2/hr)
Average spam threshold : 3.00
Average spam score : 4.46
Average ham score : -2.10
Spam kbytes processed : 397 ( 17 kb/hr)
Ham kbytes processed : 147 ( 6 kb/hr)
Total kbytes processed : 545 ( 23 kb/hr)
Spam analysis time : 339 s ( 14 s/hr)
Ham analysis time : 366 s ( 15 s/hr)
Total analysis time : 706 s ( 29 s/hr)
Statistics by Hour
----------------------------------------------------
Hour Spam Ham
------------- ----------------- --------------
2016-03-11 00 0 ( 0%) 13 (100%)
2016-03-11 01 0 ( 0%) 0 ( 0%)
2016-03-11 02 2 (100%) 0 ( 0%)
2016-03-11 03 4 (100%) 0 ( 0%)
2016-03-11 04 4 ( 57%) 3 ( 42%)
2016-03-11 05 6 ( 75%) 2 ( 25%)
2016-03-11 06 6 (100%) 0 ( 0%)
2016-03-11 07 0 ( 0%) 3 (100%)
2016-03-11 08 0 ( 0%) 0 ( 0%)
2016-03-11 09 0 ( 0%) 0 ( 0%)
2016-03-11 10 0 ( 0%) 0 ( 0%)
2016-03-11 11 0 ( 0%) 0 ( 0%)
2016-03-11 12 0 ( 0%) 0 ( 0%)
2016-03-11 13 0 ( 0%) 0 ( 0%)
2016-03-11 14 0 ( 0%) 0 ( 0%)
2016-03-11 15 0 ( 0%) 0 ( 0%)
2016-03-11 16 0 ( 0%) 0 ( 0%)
2016-03-11 17 0 ( 0%) 0 ( 0%)
2016-03-11 18 0 ( 0%) 0 ( 0%)
2016-03-11 19 0 ( 0%) 0 ( 0%)
2016-03-11 20 0 ( 0%) 0 ( 0%)
2016-03-11 21 0 ( 0%) 0 ( 0%)
2016-03-11 22 0 ( 0%) 0 ( 0%)
2016-03-11 23 0 ( 0%) 0 ( 0%)
Done. Report generated in 1 sec by sa-stats.pl, version 6256.
> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>
> I would like to know how to get these stats too.
>
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>
> Can I ask, how are you getting these stats please?
>
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
sa-stats.pl
Sometimes part of the spamassassin package. You may have to search for it on your system, otherwise, it’s available via CPAN
> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>
> I would like to know how to get these stats too.
>
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>
> Can I ask, how are you getting these stats please?
>
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: sa-stats log analyzer (RE: Missed spam, suggestions?)
Posted by "robert@chalmers.com.au" <ro...@chalmers.com.au>.
The rulesemporium site appears to be down.
If anyone has a newer version, it might be good to post it somewhere? My site for eg?
Robert
Sent from my iPad
> On 11 Mar 2016, at 04:17, David B Funk <db...@engineering.uiowa.edu> wrote:
>
> That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
> You feed it a segment of your spamd logs and it gives you
> those rule hit statistics.
>
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
>
> Looking at that wiki page, I noticed that the copy available is v0.93.
> I've got v1.03
> Does anybody know what was the newest one last avaialable on the rulesemporium site? Anbody got something newer than v1.03?
>
> I've done a bit of hacking to my copy (such as adding the S/O ratio stats).
>
>
>> On Thu, 10 Mar 2016, Erickarlo Porro wrote:
>>
>> I would like to know how to get these stats too.
>>
>> From: Robert Chalmers [mailto:robert@chalmers.com.au]
>> Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>>
>> Can I ask, how are you getting these stats please?
>>
>> Thanks
>>
>> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>>
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>>
>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my
>> inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these
>> were two phishing emails.
>>
>> Here’s some rule stats for anyone interested:
>>
>> TOP SPAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 TXREP 13171 8.47 40.38 91.00 72.91
>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
>> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
>> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
>> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
>> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
>> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
>> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
>> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>>
>> TOP HAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
>> 3 TXREP 13228 7.33 40.56 91.00 72.91
>> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
>> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
>> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
>> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
>> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
>> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
>> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the
>> rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
>> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
>> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
>> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
>> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
>> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
>> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
>> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
>> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
>> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
>> ----------------------------------------------------------------------
>> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
>> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
>> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
>> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
>> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
>> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
>> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
>> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
>> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
>> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can
>> look shakey).
>> --
>> Dave Funk University of Iowa
>> <dbfunk (at) engineering.uiowa.edu> College of Engineering
>> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>>
>> Robert Chalmers
>> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>
>>
>>
>>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
sa-stats log analyzer (RE: Missed spam, suggestions?)
Posted by David B Funk <db...@engineering.uiowa.edu>.
That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
You feed it a segment of your spamd logs and it gives you
those rule hit statistics.
See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
Looking at that wiki page, I noticed that the copy available is v0.93.
I've got v1.03
Does anybody know what was the newest one last avaialable on the rulesemporium
site? Anbody got something newer than v1.03?
I've done a bit of hacking to my copy (such as adding the S/O ratio stats).
On Thu, 10 Mar 2016, Erickarlo Porro wrote:
>
> I would like to know how to get these stats too.
>
>
>
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>
>
>
> Can I ask, how are you getting these stats please?
>
>
>
> Thanks
>
> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>
>
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my
> inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these
> were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the
> rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can
> look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
>
>
> Robert Chalmers
>
> robert@chalmers.com.au Quantum Radio: http://tinyurl.com/lwwddov
>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
>
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>
>
>
>
>
>
>
>
>
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: Missed spam, suggestions?
Posted by Erickarlo Porro <ep...@earthcam.com>.
I would like to know how to get these stats too.
From: Robert Chalmers [mailto:robert@chalmers.com.au]
Sent: Tuesday, March 08, 2016 5:25 AM
To: users@spamassassin.apache.org
Subject: Re: Missed spam, suggestions?
Can I ask, how are you getting these stats please?
Thanks
On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu>> wrote:
On Mon, 7 Mar 2016, Charles Sprickman wrote:
I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
Here’s some rule stats for anyone interested:
TOP SPAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 TXREP 13171 8.47 40.38 91.00 72.91
2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
9 BAYES_99 8486 5.46 26.02 58.63 1.18
10 BAYES_999 8141 5.24 24.96 56.25 1.06
TOP HAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
3 TXREP 13228 7.33 40.56 91.00 72.91
4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
7 BAYES_00 8390 4.65 25.72 1.84 46.24
8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
10 BAYES_50 6201 3.44 19.01 25.56 34.18
Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).
For example, here's my top-10 hits (for a one month interval).
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
2 BAYES_99 109138 32.98 82.45 0.01 0.9998
3 BAYES_999 104903 31.70 79.25 0.01 0.9999
4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
OP HAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 BAYES_00 182674 56.03 2.11 91.97 0.0150
2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu<http://engineering.uiowa.edu>> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Robert Chalmers
robert@chalmers.com<ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by Robert Chalmers <ro...@chalmers.com.au>.
Can I ask, how are you getting these stats please?
Thanks
> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>
>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>>
>> Here’s some rule stats for anyone interested:
>>
>> TOP SPAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 TXREP 13171 8.47 40.38 91.00 72.91
>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
>> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
>> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
>> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
>> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
>> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
>> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
>> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>>
>> TOP HAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
>> 3 TXREP 13228 7.33 40.56 91.00 72.91
>> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
>> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
>> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
>> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
>> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
>> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
>> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>>
>
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
>
> For example, here's my top-10 hits (for a one month interval).
>
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
> 2 BAYES_99 109138 32.98 82.45 0.01 0.9998
> 3 BAYES_999 104903 31.70 79.25 0.01 0.9999
> 4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
> 5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
> 6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
> 7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
> 8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
> 9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
> 10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
>
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
> ----------------------------------------------------------------------
> 1 BAYES_00 182674 56.03 2.11 91.97 0.0150
> 2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
> 3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
> 4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
> 5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
> 6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
> 7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
> 8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
> 9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
> 10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
>
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11. XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
Re: Missed spam, suggestions?
Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 7 Mar 2016, Charles Sprickman wrote:
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 TXREP 13171 8.47 40.38 91.00 72.91
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
> 3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
> 4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
> 5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
> 6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
> 7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
> 8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
> 9 BAYES_99 8486 5.46 26.02 58.63 1.18
> 10 BAYES_999 8141 5.24 24.96 56.25 1.06
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
> 2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
> 3 TXREP 13228 7.33 40.56 91.00 72.91
> 4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
> 5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
> 6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
> 7 BAYES_00 8390 4.65 25.72 1.84 46.24
> 8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
> 9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
> 10 BAYES_50 6201 3.44 19.01 25.56 34.18
>
Based upon your stats it looks like you need more Bayes training.
Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50
shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).
For example, here's my top-10 hits (for a one month interval).
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
2 BAYES_99 109138 32.98 82.45 0.01 0.9998
3 BAYES_999 104903 31.70 79.25 0.01 0.9999
4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
OP HAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 BAYES_00 182674 56.03 2.11 91.97 0.0150
2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way
down in the mud (below 50 rank).
BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or
conference announcments that can look shakey).
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Missed spam, suggestions?
Posted by John Hardin <jh...@impsec.org>.
On Tue, 8 Mar 2016, Matus UHLAR - fantomas wrote:
>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk>
>> wrote:
>> > how can these two stats be different?
>
> On 08.03.16 10:19, @lbutlr wrote:
>> Because one is for SPAM and one is for HAM.
>
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
> %OFHAM
>
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85
> 90.80
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
> %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85
> 90.80
>
>
> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
Speculation: 38.98 %OFMAIL = %OFSPAM * %SPAM, not %TOTAL
so: HTML_MESSAGE hit 87.85% of spam, and *that* was 39.98% of total
messages processed.
?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute
an emergency on my part. -- David W. Barts in a.s.r
-----------------------------------------------------------------------
5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: Missed spam, suggestions?
Posted by Benny Pedersen <me...@junc.eu>.
On 8. mar. 2016 18.42.03 Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
grep foo ./hamfolder
grep bar ./spamfolder
Why should both folders need same counts of mails ?
Re: Missed spam, suggestions?
Posted by David B Funk <db...@engineering.uiowa.edu>.
On Tue, 8 Mar 2016, Matus UHLAR - fantomas wrote:
>>>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk>
>>>> wrote:
>>>>> how can these two stats be different?
>
>>> On 08.03.16 10:19, @lbutlr wrote:
>>>> Because one is for SPAM and one is for HAM.
>
>>> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk>
>>> wrote:
>>> Why did you remove the important part?
>
> On 08.03.16 11:16, @lbutlr wrote:
>> I didn’t.
>
> yes, you did, so I've had to paste them again below:
>
>>> TOP SPAM RULES FIRED
>>>
>>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
>>> %OFHAM
>>>
>>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85
>>> 90.80
>>>
>>> TOP HAM RULES FIRED
>>>
>>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM
>>> %OFHAM
>>>
>>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85
>>> 90.80
>>>
>>>
>>> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
>>
>> Because on is checking SPAM and on is checking HAM.
>
> so why was %OFMAIL different from %OFSPAM in the first case and from %OFHAM
> in the second case?
>
>>> seems that the mail counts were different, but why?
>>
>> Because there are differing amounts of SPAM and HAM?
>
> if we are only checking spam mail for a given rule, how can be number of
> all hits different than number of spam hits? they all should be spam,
> shouldn't they?
Assuming that the OP was using Dallas Engelken's "sa-stats.pl" script
(I was) then the report line for each rule (excepting the first column)
should be IDENTICAL.
This script takes as input a spamd's log output. It then aggregates a digest
of all the rule hits. In a given log report there will be lines that are
spam results ("spamd: result: Y 75") and lines that are ham results ("spamd: result: . -3").
For each line (spam & ham) there will be a list of the rules that fired on that
particular message:
2016-03-08T12:37:44.833847-06:00 s-l107 spamd[10463]: spamd: result: . -3 -
BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,KHOP_RCVD_TRUST,L_LOCAL_MUCHO_DOT_LINES2,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,RP_MATCHES_RCVD,SPF_PASS,T__RECEIVED_1
scantime=3.5,size=11059,user=redacted,uid=115,required_score=6.0,rhost=s-l012.engr.uiowa.edu,raddr=128.255.17.253,rport=35620,mid=<re...@email.amazonses.com>,bayes=0.000000,autolearn=ham
autolearn_force=no
So for the HTML_MESSAGE rule, I get stats of:
grep HTML_MESSAGE sa-stats-dec.out
4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
This means that of all the messages processed (for the duration of that log run)
that rule hit %79.41 of all messages processed, %68.63 of the lines classifed as
spam (a count of 90850 and resulting in a rank of 4) and %86.59 of the lines
classifed as ham (a count of 171992 resulting in a rank of 2).
Thus for a given rule, the %all-messages, %spam %ham should be IDENTICAL.
(assuming they are from the same log run).
So for the OP's original post, having %spam %ham be identical but %all-messages
being different is weird. Now it could be that he's got a different version of
the sa-stats script, it has an addtional field, that "%of-rules" thing.
So to Charles Sprickman, which sa-stats script did you use to generate your
rules report?
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: Missed spam, suggestions?
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>>> how can these two stats be different?
>> On 08.03.16 10:19, @lbutlr wrote:
>>> Because one is for SPAM and one is for HAM.
>> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>> Why did you remove the important part?
On 08.03.16 11:16, @lbutlr wrote:
>I didn’t.
yes, you did, so I've had to paste them again below:
>> TOP SPAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>>
>> TOP HAM RULES FIRED
>>
>> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>>
>> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>>
>>
>> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
>
>Because on is checking SPAM and on is checking HAM.
so why was %OFMAIL different from %OFSPAM in the first case and from %OFHAM
in the second case?
>> seems that the mail counts were different, but why?
>
>Because there are differing amounts of SPAM and HAM?
if we are only checking spam mail for a given rule, how can be number of
all hits different than number of spam hits? they all should be spam,
shouldn't they?
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
On the other hand, you have different fingers.
Re: Missed spam, suggestions?
Posted by "@lbutlr" <kr...@kreme.com>.
> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>
>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>> how can these two stats be different?
>
> On 08.03.16 10:19, @lbutlr wrote:
>> Because one is for SPAM and one is for HAM.
>
> Why did you remove the important part?
I didn’t.
> TOP SPAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>
> TOP HAM RULES FIRED
>
> RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
>
>
> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
Because on is checking SPAM and on is checking HAM.
> seems that the mail counts were different, but why?
Because there are differing amounts of SPAM and HAM?
--
"Rosa sat, so Martin could walk. Martin walked, so Obama could run.
Obama ran, so our children can fly." (paraphrased from NPR)
Re: Missed spam, suggestions?
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>> how can these two stats be different?
On 08.03.16 10:19, @lbutlr wrote:
>Because one is for SPAM and one is for HAM.
Why did you remove the important part?
TOP SPAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
TOP HAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
seems that the mail counts were different, but why?
did Charles generate stats at that very different times?
comparing results from the same set would be much better...
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.
Re: Missed spam, suggestions?
Posted by "@lbutlr" <kr...@kreme.com>.
On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> how can these two stats be different?
Because one is for SPAM and one is for HAM.
--
No man is free who is not master of himself
Re: Missed spam, suggestions?
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 07.03.16 23:39, Charles Sprickman wrote:
>TOP SPAM RULES FIRED
>
>RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
>TOP HAM RULES FIRED
>
>RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
>
> 1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
how can these two stats be different?
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"They say when you play that M$ CD backward you can hear satanic messages."
"That's nothing. If you play it forward it will install Windows."
Re: Missed spam, suggestions?
Posted by Charles Sprickman <sp...@bway.net>.
> On Feb 29, 2016, at 3:18 PM, Reindl Harald <h....@thelounge.net> wrote:
>
> Am 29.02.2016 um 21:05 schrieb Charles Sprickman:
>>> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
>>>
>>> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>>>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
>>>
>>> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
>>>
>>
>> I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.
>
> and i know that it don't matter
>
> nothing easier then rsync the bayes-folder to several machines at the end of the learning script, we even share the side-wide bayes over webservices to external entities and so it coves around 5000 users at the moment in summary
I’m not seeing much of a change in load after enabling this with a global user and no autolearn. I think the db was really only constrained on the inserts/updates.
>
>> I’ll try that today and see how the load looks. My concern with disabling autolearn is that then I’m the only one training. My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.
>
> you should be the only one who trains in most cases for several reasons
>
> * few to zero users train anough ham and spam for a proper bayes
> * wrong classified autolearn takes a wrong direction sooner or later
>
> given that we now for more than a year maintain a side-wide bayes for inbound MX re-used on submission servers to minimize the impact of hacked accounts and it works so much better than all the "user bayes" solutions the last decade it's the way to go if you *really* want proper operations
I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox. I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
Here’s some rule stats for anyone interested:
TOP SPAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 TXREP 13171 8.47 40.38 91.00 72.91
2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
9 BAYES_99 8486 5.46 26.02 58.63 1.18
10 BAYES_999 8141 5.24 24.96 56.25 1.06
TOP HAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
3 TXREP 13228 7.33 40.56 91.00 72.91
4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
7 BAYES_00 8390 4.65 25.72 1.84 46.24
8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
10 BAYES_50 6201 3.44 19.01 25.56 34.18
Charles
Re: Missed spam, suggestions?
Posted by Reindl Harald <h....@thelounge.net>.
Am 29.02.2016 um 21:05 schrieb Charles Sprickman:
>> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
>>
>> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
>>
>> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
>>
>
> I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.
and i know that it don't matter
nothing easier then rsync the bayes-folder to several machines at the
end of the learning script, we even share the side-wide bayes over
webservices to external entities and so it coves around 5000 users at
the moment in summary
> I’ll try that today and see how the load looks. My concern with disabling autolearn is that then I’m the only one training. My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.
you should be the only one who trains in most cases for several reasons
* few to zero users train anough ham and spam for a proper bayes
* wrong classified autolearn takes a wrong direction sooner or later
given that we now for more than a year maintain a side-wide bayes for
inbound MX re-used on submission servers to minimize the impact of
hacked accounts and it works so much better than all the "user bayes"
solutions the last decade it's the way to go if you *really* want proper
operations
Re: Missed spam, suggestions?
Posted by Charles Sprickman <sp...@bway.net>.
> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
>
>
>
> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
>
> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
>
I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.
I’ll try that today and see how the load looks. My concern with disabling autolearn is that then I’m the only one training. My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.
> a filter without bayes is worthless
It seems so. :)
Thanks,
Charles
--
Charles Sprickman
NetEng/SysAdmin
Bway.net - New York's Best Internet www.bway.net
spork@bway.net - 212.982.9800
>
> 0 61323 SPAM
> 0 21811 HAM
> 0 2547152 TOKEN
>
> insgesamt 73M
> -rw------- 1 sa-milt sa-milt 10M 2016-02-29 00:21 bayes_seen
> -rw------- 1 sa-milt sa-milt 81M 2016-02-29 00:21 bayes_toks
>
> BAYES_00 29161 73.70 %
> BAYES_05 764 1.93 %
> BAYES_20 931 2.35 %
> BAYES_40 815 2.05 %
> BAYES_50 2909 7.35 %
> BAYES_60 424 1.07 % 8.14 % (OF TOTAL BLOCKED)
> BAYES_80 337 0.85 % 6.47 % (OF TOTAL BLOCKED)
> BAYES_95 306 0.77 % 5.87 % (OF TOTAL BLOCKED)
> BAYES_99 3918 9.90 % 75.25 % (OF TOTAL BLOCKED)
> BAYES_999 3491 8.82 % 67.05 % (OF TOTAL BLOCKED)
>
> DNSWL 53551 91.16 %
> SPF 38530 65.59 %
> SPF/DKIM WL 16750 28.51 %
> SHORTCIRCUIT 19112 32.53 %
>
> BLOCKED 5206 8.86 %
> SPAMMY 4985 8.48 % 95.75 % (OF TOTAL BLOCKED)
>
Re: Missed spam, suggestions?
Posted by Reindl Harald <h....@thelounge.net>.
Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
just make a sitewide bayes
(https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without
autolearn / autoexpire and the default database in a folder read-only
for the daemon
a filter without bayes is worthless
0 61323 SPAM
0 21811 HAM
0 2547152 TOKEN
insgesamt 73M
-rw------- 1 sa-milt sa-milt 10M 2016-02-29 00:21 bayes_seen
-rw------- 1 sa-milt sa-milt 81M 2016-02-29 00:21 bayes_toks
BAYES_00 29161 73.70 %
BAYES_05 764 1.93 %
BAYES_20 931 2.35 %
BAYES_40 815 2.05 %
BAYES_50 2909 7.35 %
BAYES_60 424 1.07 % 8.14 % (OF TOTAL BLOCKED)
BAYES_80 337 0.85 % 6.47 % (OF TOTAL BLOCKED)
BAYES_95 306 0.77 % 5.87 % (OF TOTAL BLOCKED)
BAYES_99 3918 9.90 % 75.25 % (OF TOTAL BLOCKED)
BAYES_999 3491 8.82 % 67.05 % (OF TOTAL BLOCKED)
DNSWL 53551 91.16 %
SPF 38530 65.59 %
SPF/DKIM WL 16750 28.51 %
SHORTCIRCUIT 19112 32.53 %
BLOCKED 5206 8.86 %
SPAMMY 4985 8.48 % 95.75 % (OF TOTAL BLOCKED)
Re: Missed spam, suggestions?
Posted by Tom Hendrikx <to...@whyscream.net>.
On 29-02-16 06:24, Charles Sprickman wrote:
> Hi all,
>
> Recently I occasionally get bursts of spam that slips through Postfix
> (postscreen BL checks, protocol checks) and SpamAssassin. I just had
> another big jump in the last week. This was mostly spam touting Oil
> Changes, SUV sales and Lawyer Finders.
>
> What I just did was go through a collection of missed spam and re-ran
> it through spamassassin. All of it jumped from originally scoring
> around 2-3 to a minimum of 6.5 with most hitting around 12. The
> biggest difference I see is that DNSBL and URIBL services had started
> hitting. When originally received, these emails all originated from
> very clean IPs.
>
> I have TXREP enabled as well, but that doesn’t seem to be having
> either a positive or negative impact.
>
> What are my options to try to catch this junk before it hits the
> various *BLs?
>
> I’ve not had much luck with Bayes - when I had it enabled recently on
> a per-user basis it was just hitting the master DB server too hard
> with udpates. I’m considering enabling it again with a shared db for
> all users, which I hope might work better. It would only be auto
> trained, perhaps with some manual training by me.
>
> Here’s a few samples, hosted elsewhere so as not to trip anyone’s
> filters:
>
> https://gist.github.com/anonymous/0fcaf481875959c9151f (2.7 on
> Friday, 14 tonight)
>
> https://gist.github.com/anonymous/a5396f68699392808988 (3.4 earlier
> tonight, 6.5 just now)
>
> I have more samples, I can dig them up if that’s helpful.
>
> Sometimes I wonder how much this has to do with the age of our domain
> and the fact that it begins with “b”. :)
>
> The only thing I’ve been contemplating is a local spamtrap and DNSBL.
> We have a site that’s regularly trawled for email addresses, so
> seeding it should not be too difficult…
>
Hi,
You want to give the RBLs a bit more time to kick in, you could consider
greylisting (or postscreen after-220 checks which also cause a delay and
a retry).
Regards,
Tom