You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Charles Sprickman <sp...@bway.net> on 2016/02/29 06:24:50 UTC

Missed spam, suggestions?

Hi all,

Recently I occasionally get bursts of spam that slips through Postfix (postscreen BL checks, protocol checks) and SpamAssassin.  I just had another big jump in the last week.  This was mostly spam touting Oil Changes, SUV sales and Lawyer Finders.

What I just did was go through a collection of missed spam and re-ran it through spamassassin. All of it jumped from originally scoring around 2-3 to a minimum of 6.5 with most hitting around 12.  The biggest difference I see is that DNSBL and URIBL services had started hitting. When originally received, these emails all originated from very clean IPs.

I have TXREP enabled as well, but that doesn’t seem to be having either a positive or negative impact.

What are my options to try to catch this junk before it hits the various *BLs?

I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates.  I’m considering enabling it again with a shared db for all users, which I hope might work better.  It would only be auto trained, perhaps with some manual training by me.

Here’s a few samples, hosted elsewhere so as not to trip anyone’s filters:

https://gist.github.com/anonymous/0fcaf481875959c9151f (2.7 on Friday, 14 tonight)

https://gist.github.com/anonymous/a5396f68699392808988 (3.4 earlier tonight, 6.5 just now)

I have more samples, I can dig them up if that’s helpful.

Sometimes I wonder how much this has to do with the age of our domain and the fact that it begins with “b”. :)

The only thing I’ve been contemplating is a local spamtrap and DNSBL.  We have a site that’s regularly trawled for email addresses, so seeding it should not be too difficult…

Charles

Re: Missed spam, suggestions?

Posted by John Hardin <jh...@impsec.org>.
On Mon, 29 Feb 2016, Charles Sprickman wrote:

> My concern with disabling autolearn is that then I’m the only one 
> training.  My spam probably looks like everyone else’s, but my ham is 
> very different, lots list traffic and such.

You can still have your users provide misses for training, you'd just need 
to vet the messages before feeding them to sa_learn (unless you really 
trust a given user's judgement and honesty - the big problem is users 
training messages from lists they actually did subscribe to as spam, 
rather than unsubscribing).

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   We should endeavour to teach our children to be gun-proof
   rather than trying to design our guns to be child-proof
-----------------------------------------------------------------------
  13 days until Albert Einstein's 137th Birthday

Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
Sorry - I missed the post from dbfunk. I just saw it in the archive. sa-stats.pl is the program, 
and you have to feed it from spamd.log to get those stats.

To get a spamd.log, you have to start spamd with this 
-s facility, --syslog=facility <>
Specify the syslog facility to use (default: mail). If stderr is specified, output will be written to stderr. (This is useful if you're running spamd under the daemontools package.) With a facility of file, all output goes to spamd.log. facility is interpreted as a file name to log to if it contains any characters except a-z and 0-9. null disables logging completely (used internally).

spamd -s /var/log/spamd.log # log to file /var/log/spamd.log






> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
> 
> I would like to know how to get these stats too.
>  
> From: Robert Chalmers [mailto:robert@chalmers.com.au] 
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>  
> Can I ask, how are you getting these stats please?
>  
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>  
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
> 
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
> 
> Here’s some rule stats for anyone interested:
> 
> TOP SPAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         TXREP                       13171   8.47   40.38  91.00  72.91
>  2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>  3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>  4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>  5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>  6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>  7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>  8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>  9         BAYES_99                             8486    5.46   26.02  58.63   1.18
> 10        BAYES_999                           8141    5.24   24.96  56.25   1.06
> 
> TOP HAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>  2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>  3         TXREP                       13228   7.33   40.56  91.00  72.91
>  4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>  5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>  6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>  7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>  8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>  9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
> 10        BAYES_50                             6201    3.44   19.01  25.56  34.18
> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>  
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
Thanks, yes, confusion had set in there … now I’m on the right track

It will however be handy to have both.
Robert

> On 11 Mar 2016, at 14:59, Dave Funk <db...@engineering.uiowa.edu> wrote:
> 
> TL;DR
> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
> 
> This is confusing because there are two different programs named "sa-stats.pl".
> 
> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
> 
> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> be sure to search that page for reference to Dallas Engelken.
> 
> 
> 
> On Fri, 11 Mar 2016, Robert Chalmers wrote:
> 
>> The sa-stats.pl I refer to is here.
>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
>> that is.
>> and has an output like this.
>> zeus:~ robert$ perl sa-stats.pl
>> Report Title     : SpamAssassin - Spam Statistics
>> Report Date      : 2016-03-11
>> Period Beginning : Fri 11 Mar 00:00:00 2016
>> Period Ending    : Sat 12 Mar 00:00:00 2016
>> Reporting Period : 24.00 hrs
>> --------------------------------------------------
>> Note: 'ham' = 'nonspam'
>> Total spam detected    :       22 (  51.16%)
>> Total ham accepted     :       21 (  48.84%)
>>                         -------------------
>> Total emails processed :       43 (    2/hr)
>> Average spam threshold :        3.00
>> Average spam score     :        4.46
>> Average ham score      :       -2.10
>> Spam kbytes processed  :      397   (   17 kb/hr)
>> Ham kbytes processed   :      147   (    6 kb/hr)
>> Total kbytes processed :      545   (   23 kb/hr)
>> Spam analysis time     :      339 s (   14 s/hr)
>> Ham analysis time      :      366 s (   15 s/hr)
>> Total analysis time    :      706 s (   29 s/hr)
>> Statistics by Hour
>> ----------------------------------------------------
>> Hour                          Spam               Ham
>> -------------    -----------------    --------------
>> 2016-03-11 00             0 (  0%)         13 (100%)
>> 2016-03-11 01             0 (  0%)          0 (  0%)
>> 2016-03-11 02             2 (100%)          0 (  0%)
>> 2016-03-11 03             4 (100%)          0 (  0%)
>> 2016-03-11 04             4 ( 57%)          3 ( 42%)
>> 2016-03-11 05             6 ( 75%)          2 ( 25%)
>> 2016-03-11 06             6 (100%)          0 (  0%)
>> 2016-03-11 07             0 (  0%)          3 (100%)
>> 2016-03-11 08             0 (  0%)          0 (  0%)
>> 2016-03-11 09             0 (  0%)          0 (  0%)
>> 2016-03-11 10             0 (  0%)          0 (  0%)
>> 2016-03-11 11             0 (  0%)          0 (  0%)
>> 2016-03-11 12             0 (  0%)          0 (  0%)
>> 2016-03-11 13             0 (  0%)          0 (  0%)
>> 2016-03-11 14             0 (  0%)          0 (  0%)
>> 2016-03-11 15             0 (  0%)          0 (  0%)
>> 2016-03-11 16             0 (  0%)          0 (  0%)
>> 2016-03-11 17             0 (  0%)          0 (  0%)
>> 2016-03-11 18             0 (  0%)          0 (  0%)
>> 2016-03-11 19             0 (  0%)          0 (  0%)
>> 2016-03-11 20             0 (  0%)          0 (  0%)
>> 2016-03-11 21             0 (  0%)          0 (  0%)
>> 2016-03-11 22             0 (  0%)          0 (  0%)
>> 2016-03-11 23             0 (  0%)          0 (  0%)
>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>> 
>>      On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>> I would like to know how to get these stats too.
>> From: Robert Chalmers [mailto:robert@chalmers.com.au] Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>> Can I ask, how are you getting these stats please?
>> Thanks
>>      On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>> 
>>      I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve
>>      seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>> 
>>      Here’s some rule stats for anyone interested:
>> 
>>      TOP SPAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         TXREP                       13171   8.47   40.38  91.00  72.91
>>       2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>>       3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>>       4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>>       5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>>       6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>>       7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>>       8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>>       9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>>      10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>> 
>>      TOP HAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>>       2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>>       3         TXREP                       13228   7.33   40.56  91.00  72.91
>>       4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>>       5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>>       6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>>       7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>>       8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>>       9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>>      10        BAYES_50                             6201    3.44   19.01  25.56  34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>> stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>> -- 
>> Dave Funk                                  University of Iowa
>> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
>> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>> 
>> Robert Chalmers
>> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> Robert Chalmers
>> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Mar 2016, Robert Chalmers wrote:

> Found a copy here …
> http://www.impsec.org/~jhardin/antispam/sa-stats.pl

Note that I also host a version that works with gzipped log files, if you 
have compression enabled in your log rotator.

But that's not the latest. I don't know where the v1.03 David has came 
from. David, if you'd care to email me your copy, I'll see about updating 
the one I host.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   If you ask amateurs to act as front-line security personnel,
   you shouldn't be surprised when you get amateur security.
                                                     -- Bruce Schneier
-----------------------------------------------------------------------
  84 days since the first successful real return to launch site (SpaceX)

Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
Found a copy here …
http://www.impsec.org/~jhardin/antispam/sa-stats.pl


So finally found the right one. It does seem to be all working ok - at least to my eye.

./Sa_Stats.pl --logdir /var/log --filename spamd.log --num 18
Email:       53  Autolearn:    14  AvgScore:   1.02  AvgScanTime:  6.14 sec
Spam:        20  Autolearn:     0  AvgScore:   4.15  AvgScanTime:  5.29 sec
Ham:         33  Autolearn:    14  AvgScore:  -0.88  AvgScanTime:  6.65 sec

Time Spent Running SA:         0.09 hours
Time Spent Processing Spam:    0.03 hours
Time Spent Processing Ham:     0.06 hours

TOP SPAM RULES FIRED
------------------------------------------------------------------------------
RANK	RULE NAME               	COUNT  %OFMAIL %OFSPAM  %OFHAM  AVGSCO        
------------------------------------------------------------------------------
   1	HTML_MESSAGE            	   20	 52.83	100.00	 24.24	  4.15
   2	SPF_PASS                	   17	 43.40	 85.00	 18.18	  3.76
   3	DCC_CHECK               	   15	 39.62	 75.00	 18.18	  4.33
   4	BAYES_50                	   14	 26.42	 70.00	  0.00	  3.86
   5	RDNS_NONE               	   13	 24.53	 65.00	  0.00	  4.15
   6	SPF_HELO_PASS           	   13	 24.53	 65.00	  0.00	  4.00
   7	T_REMOTE_IMAGE          	    8	 15.09	 40.00	  0.00	  3.75
   8	DKIM_SIGNED             	    6	 45.28	 30.00	 54.55	  3.17
   9	BAYES_999               	    6	 11.32	 30.00	  0.00	  4.83
  10	BAYES_99                	    6	 11.32	 30.00	  0.00	  4.83
  11	DKIM_VALID              	    6	 45.28	 30.00	 54.55	  3.17
  12	RP_MATCHES_RCVD         	    4	 30.19	 20.00	 36.36	  3.25
  13	DKIM_VALID_AU           	    4	 37.74	 20.00	 48.48	  3.00
  14	HTML_IMAGE_RATIO_02     	    3	  5.66	 15.00	  0.00	  3.67
  15	MPART_ALT_DIFF_COUNT    	    3	  5.66	 15.00	  0.00	  6.67
  16	MPART_ALT_DIFF          	    2	  3.77	 10.00	  0.00	  6.50
  17	FROM_12LTRDOM           	    2	  3.77	 10.00	  0.00	  3.00
  18	MORE_SEX                	    2	  3.77	 10.00	  0.00	  5.00
------------------------------------------------------------------------------

TOP HAM RULES FIRED
------------------------------------------------------------------------------
RANK	RULE NAME               	COUNT  %OFMAIL %OFSPAM  %OFHAM  AVGSCO        
------------------------------------------------------------------------------
   1	BAYES_00                	   32	 60.38	  0.00	 96.97	 -0.91
   2	HEADER_FROM_DIFFERENT_DOMAINS	   29	 56.60	  5.00	 87.88	 -0.83
   3	DKIM_VALID              	   18	 45.28	 30.00	 54.55	 -0.78
   4	DKIM_SIGNED             	   18	 45.28	 30.00	 54.55	 -0.78
   5	DKIM_VALID_AU           	   16	 37.74	 20.00	 48.48	 -0.88
   6	RP_MATCHES_RCVD         	   12	 30.19	 20.00	 36.36	 -1.08
   7	HTML_MESSAGE            	    8	 52.83	100.00	 24.24	 -0.88
   8	DCC_CHECK               	    6	 39.62	 75.00	 18.18	  0.17
   9	FREEMAIL_FORGED_FROMDOMAIN	    6	 11.32	  0.00	 18.18	 -1.17
  10	SPF_PASS                	    6	 43.40	 85.00	 18.18	 -1.17
  11	FREEMAIL_FROM           	    6	 13.21	  5.00	 18.18	 -1.17
  12	UNPARSEABLE_RELAY       	    3	  5.66	  0.00	  9.09	 -1.00
  13	DEAR_SOMETHING          	    2	  3.77	  0.00	  6.06	  0.50
  14	MSGID_FROM_MTA_HEADER   	    1	  1.89	  0.00	  3.03	 -1.00
  15	HTML_FONT_LOW_CONTRAST  	    1	  5.66	 10.00	  3.03	  0.00
  16	DKIM_ADSP_CUSTOM_MED    	    1	  1.89	  0.00	  3.03	 -1.00
  17	BAYES_05                	    1	  1.89	  0.00	  3.03	  0.00
  18	ALL_TRUSTED             	    1	  1.89	  0.00	  3.03	 -2.00
------------------------------------------------------------------------------





> On 11 Mar 2016, at 15:33, Robert Chalmers <ro...@chalmers.com.au> wrote:
> 
> 
> Just a note - that server address isn’t responding at the moment. Maybe later.Hopefully only temporary.
> 
> 
>> On 11 Mar 2016, at 14:59, Dave Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>> 
>> TL;DR
>> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
>> 
>> This is confusing because there are two different programs named "sa-stats.pl".
>> 
>> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
>> 
>> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
>> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers <http://wiki.apache.org/spamassassin/StatsAndAnalyzers>
>> be sure to search that page for reference to Dallas Engelken.
>> 
>> 
>> 
>> On Fri, 11 Mar 2016, Robert Chalmers wrote:
>> 
>>> The sa-stats.pl I refer to is here.
>>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl <https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl>. It’s not the same as the ones shown in other posts. I don’t know what
>>> that is.
>>> and has an output like this.
>>> zeus:~ robert$ perl sa-stats.pl
>>> Report Title     : SpamAssassin - Spam Statistics
>>> Report Date      : 2016-03-11
>>> Period Beginning : Fri 11 Mar 00:00:00 2016
>>> Period Ending    : Sat 12 Mar 00:00:00 2016
>>> Reporting Period : 24.00 hrs
>>> --------------------------------------------------
>>> Note: 'ham' = 'nonspam'
>>> Total spam detected    :       22 (  51.16%)
>>> Total ham accepted     :       21 (  48.84%)
>>>                         -------------------
>>> Total emails processed :       43 (    2/hr)
>>> Average spam threshold :        3.00
>>> Average spam score     :        4.46
>>> Average ham score      :       -2.10
>>> Spam kbytes processed  :      397   (   17 kb/hr)
>>> Ham kbytes processed   :      147   (    6 kb/hr)
>>> Total kbytes processed :      545   (   23 kb/hr)
>>> Spam analysis time     :      339 s (   14 s/hr)
>>> Ham analysis time      :      366 s (   15 s/hr)
>>> Total analysis time    :      706 s (   29 s/hr)
>>> Statistics by Hour
>>> ----------------------------------------------------
>>> Hour                          Spam               Ham
>>> -------------    -----------------    --------------
>>> 2016-03-11 00             0 (  0%)         13 (100%)
>>> 2016-03-11 01             0 (  0%)          0 (  0%)
>>> 2016-03-11 02             2 (100%)          0 (  0%)
>>> 2016-03-11 03             4 (100%)          0 (  0%)
>>> 2016-03-11 04             4 ( 57%)          3 ( 42%)
>>> 2016-03-11 05             6 ( 75%)          2 ( 25%)
>>> 2016-03-11 06             6 (100%)          0 (  0%)
>>> 2016-03-11 07             0 (  0%)          3 (100%)
>>> 2016-03-11 08             0 (  0%)          0 (  0%)
>>> 2016-03-11 09             0 (  0%)          0 (  0%)
>>> 2016-03-11 10             0 (  0%)          0 (  0%)
>>> 2016-03-11 11             0 (  0%)          0 (  0%)
>>> 2016-03-11 12             0 (  0%)          0 (  0%)
>>> 2016-03-11 13             0 (  0%)          0 (  0%)
>>> 2016-03-11 14             0 (  0%)          0 (  0%)
>>> 2016-03-11 15             0 (  0%)          0 (  0%)
>>> 2016-03-11 16             0 (  0%)          0 (  0%)
>>> 2016-03-11 17             0 (  0%)          0 (  0%)
>>> 2016-03-11 18             0 (  0%)          0 (  0%)
>>> 2016-03-11 19             0 (  0%)          0 (  0%)
>>> 2016-03-11 20             0 (  0%)          0 (  0%)
>>> 2016-03-11 21             0 (  0%)          0 (  0%)
>>> 2016-03-11 22             0 (  0%)          0 (  0%)
>>> 2016-03-11 23             0 (  0%)          0 (  0%)
>>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>>> 
>>>      On 10 Mar 2016, at 21:38, Erickarlo Porro <eporro@earthcam.com <ma...@earthcam.com>> wrote:
>>> I would like to know how to get these stats too.
>>> From: Robert Chalmers [mailto:robert@chalmers.com.au <ma...@chalmers.com.au>] Sent: Tuesday, March 08, 2016 5:25 AM
>>> To: users@spamassassin.apache.org <ma...@spamassassin.apache.org>
>>> Subject: Re: Missed spam, suggestions?
>>> Can I ask, how are you getting these stats please?
>>> Thanks
>>>      On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>>> 
>>>      I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve
>>>      seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>>> 
>>>      Here’s some rule stats for anyone interested:
>>> 
>>>      TOP SPAM RULES FIRED
>>> 
>>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>>> 
>>>       1         TXREP                       13171   8.47   40.38  91.00  72.91
>>>       2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>>>       3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>>>       4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>>>       5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>>>       6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>>>       7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>>>       8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>>>       9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>>>      10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>>> 
>>>      TOP HAM RULES FIRED
>>> 
>>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>>> 
>>>       1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>>>       2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>>>       3         TXREP                       13228   7.33   40.56  91.00  72.91
>>>       4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>>>       5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>>>       6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>>>       7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>>>       8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>>>       9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>>>      10        BAYES_50                             6201    3.44   19.01  25.56  34.18
>>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>>> stats and BAYES_50 shouldn't be in the top-10 at all.
>>> (of course if you've only been training for a week that would explain it).
>>> For example, here's my top-10 hits (for a one month interval).
>>> TOP SPAM RULES FIRED
>>> ----------------------------------------------------------------------
>>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>>> ----------------------------------------------------------------------
>>>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>>>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>>>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>>>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>>>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>>>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>>>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>>>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>>>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>>>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
>>> OP HAM RULES FIRED
>>> ----------------------------------------------------------------------
>>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>>> ----------------------------------------------------------------------
>>>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>>>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>>>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>>>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>>>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>>>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>>>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>>>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>>>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>>>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
>>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>>> -- 
>>> Dave Funk                                  University of Iowa
>>> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>>        College of Engineering
>>> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>>> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
>>> #include <std_disclaimer.h>
>>> Better is not better, 'standard' is better. B{
>>> 
>>> Robert Chalmers
>>> robert@chalmers.com.au <ma...@chalmers.com.au>  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
>>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>> Robert Chalmers
>>> robert@chalmers.com.au <ma...@chalmers.com.au>  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
>>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>> 
>> 
>> -- 
>> Dave Funk                                  University of Iowa
>> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>>        College of Engineering
>> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
> 
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
> 
> 
> 
> 

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
Just a note - that server address isn’t responding at the moment. Maybe later.Hopefully only temporary.


> On 11 Mar 2016, at 14:59, Dave Funk <db...@engineering.uiowa.edu> wrote:
> 
> TL;DR
> You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.
> 
> This is confusing because there are two different programs named "sa-stats.pl".
> 
> The one that comes with SpamAssassin (what you're referring to) is an engine stats reporting tool; does not do rule hits analysis.
> 
> The tool that Charles Sprickman and I used is the one from Dallas Engelken.
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> be sure to search that page for reference to Dallas Engelken.
> 
> 
> 
> On Fri, 11 Mar 2016, Robert Chalmers wrote:
> 
>> The sa-stats.pl I refer to is here.
>> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
>> that is.
>> and has an output like this.
>> zeus:~ robert$ perl sa-stats.pl
>> Report Title     : SpamAssassin - Spam Statistics
>> Report Date      : 2016-03-11
>> Period Beginning : Fri 11 Mar 00:00:00 2016
>> Period Ending    : Sat 12 Mar 00:00:00 2016
>> Reporting Period : 24.00 hrs
>> --------------------------------------------------
>> Note: 'ham' = 'nonspam'
>> Total spam detected    :       22 (  51.16%)
>> Total ham accepted     :       21 (  48.84%)
>>                         -------------------
>> Total emails processed :       43 (    2/hr)
>> Average spam threshold :        3.00
>> Average spam score     :        4.46
>> Average ham score      :       -2.10
>> Spam kbytes processed  :      397   (   17 kb/hr)
>> Ham kbytes processed   :      147   (    6 kb/hr)
>> Total kbytes processed :      545   (   23 kb/hr)
>> Spam analysis time     :      339 s (   14 s/hr)
>> Ham analysis time      :      366 s (   15 s/hr)
>> Total analysis time    :      706 s (   29 s/hr)
>> Statistics by Hour
>> ----------------------------------------------------
>> Hour                          Spam               Ham
>> -------------    -----------------    --------------
>> 2016-03-11 00             0 (  0%)         13 (100%)
>> 2016-03-11 01             0 (  0%)          0 (  0%)
>> 2016-03-11 02             2 (100%)          0 (  0%)
>> 2016-03-11 03             4 (100%)          0 (  0%)
>> 2016-03-11 04             4 ( 57%)          3 ( 42%)
>> 2016-03-11 05             6 ( 75%)          2 ( 25%)
>> 2016-03-11 06             6 (100%)          0 (  0%)
>> 2016-03-11 07             0 (  0%)          3 (100%)
>> 2016-03-11 08             0 (  0%)          0 (  0%)
>> 2016-03-11 09             0 (  0%)          0 (  0%)
>> 2016-03-11 10             0 (  0%)          0 (  0%)
>> 2016-03-11 11             0 (  0%)          0 (  0%)
>> 2016-03-11 12             0 (  0%)          0 (  0%)
>> 2016-03-11 13             0 (  0%)          0 (  0%)
>> 2016-03-11 14             0 (  0%)          0 (  0%)
>> 2016-03-11 15             0 (  0%)          0 (  0%)
>> 2016-03-11 16             0 (  0%)          0 (  0%)
>> 2016-03-11 17             0 (  0%)          0 (  0%)
>> 2016-03-11 18             0 (  0%)          0 (  0%)
>> 2016-03-11 19             0 (  0%)          0 (  0%)
>> 2016-03-11 20             0 (  0%)          0 (  0%)
>> 2016-03-11 21             0 (  0%)          0 (  0%)
>> 2016-03-11 22             0 (  0%)          0 (  0%)
>> 2016-03-11 23             0 (  0%)          0 (  0%)
>> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>> 
>>      On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
>> I would like to know how to get these stats too.
>> From: Robert Chalmers [mailto:robert@chalmers.com.au] Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>> Can I ask, how are you getting these stats please?
>> Thanks
>>      On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>> 
>>      I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve
>>      seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>> 
>>      Here’s some rule stats for anyone interested:
>> 
>>      TOP SPAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         TXREP                       13171   8.47   40.38  91.00  72.91
>>       2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>>       3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>>       4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>>       5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>>       6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>>       7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>>       8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>>       9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>>      10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>> 
>>      TOP HAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>>       2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>>       3         TXREP                       13228   7.33   40.56  91.00  72.91
>>       4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>>       5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>>       6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>>       7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>>       8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>>       9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>>      10        BAYES_50                             6201    3.44   19.01  25.56  34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
>> stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
>> -- 
>> Dave Funk                                  University of Iowa
>> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
>> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>> 
>> Robert Chalmers
>> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> Robert Chalmers
>> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by Dave Funk <db...@engineering.uiowa.edu>.
TL;DR
You want Dallas Engelken's "sa-stats.pl" NOT the one from SA.

This is confusing because there are two different programs named 
"sa-stats.pl".

The one that comes with SpamAssassin (what you're referring to) is an 
engine stats reporting tool; does not do rule hits analysis.

The tool that Charles Sprickman and I used is the one from Dallas 
Engelken.
See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
be sure to search that page for reference to Dallas Engelken.



On Fri, 11 Mar 2016, Robert Chalmers wrote:

> The sa-stats.pl I refer to is here.
> https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what
> that is.
> 
> and has an output like this.
> 
> zeus:~ robert$ perl sa-stats.pl
> Report Title     : SpamAssassin - Spam Statistics
> Report Date      : 2016-03-11
> Period Beginning : Fri 11 Mar 00:00:00 2016
> Period Ending    : Sat 12 Mar 00:00:00 2016
> 
> Reporting Period : 24.00 hrs
> --------------------------------------------------
> 
> Note: 'ham' = 'nonspam'
> 
> Total spam detected    :       22 (  51.16%)
> Total ham accepted     :       21 (  48.84%)
>                         -------------------
> Total emails processed :       43 (    2/hr)
> 
> Average spam threshold :        3.00
> Average spam score     :        4.46
> Average ham score      :       -2.10
> 
> Spam kbytes processed  :      397   (   17 kb/hr)
> Ham kbytes processed   :      147   (    6 kb/hr)
> Total kbytes processed :      545   (   23 kb/hr)
> 
> Spam analysis time     :      339 s (   14 s/hr)
> Ham analysis time      :      366 s (   15 s/hr)
> Total analysis time    :      706 s (   29 s/hr)
> 
> 
> Statistics by Hour
> ----------------------------------------------------
> Hour                          Spam               Ham
> -------------    -----------------    --------------
> 2016-03-11 00             0 (  0%)         13 (100%)
> 2016-03-11 01             0 (  0%)          0 (  0%)
> 2016-03-11 02             2 (100%)          0 (  0%)
> 2016-03-11 03             4 (100%)          0 (  0%)
> 2016-03-11 04             4 ( 57%)          3 ( 42%)
> 2016-03-11 05             6 ( 75%)          2 ( 25%)
> 2016-03-11 06             6 (100%)          0 (  0%)
> 2016-03-11 07             0 (  0%)          3 (100%)
> 2016-03-11 08             0 (  0%)          0 (  0%)
> 2016-03-11 09             0 (  0%)          0 (  0%)
> 2016-03-11 10             0 (  0%)          0 (  0%)
> 2016-03-11 11             0 (  0%)          0 (  0%)
> 2016-03-11 12             0 (  0%)          0 (  0%)
> 2016-03-11 13             0 (  0%)          0 (  0%)
> 2016-03-11 14             0 (  0%)          0 (  0%)
> 2016-03-11 15             0 (  0%)          0 (  0%)
> 2016-03-11 16             0 (  0%)          0 (  0%)
> 2016-03-11 17             0 (  0%)          0 (  0%)
> 2016-03-11 18             0 (  0%)          0 (  0%)
> 2016-03-11 19             0 (  0%)          0 (  0%)
> 2016-03-11 20             0 (  0%)          0 (  0%)
> 2016-03-11 21             0 (  0%)          0 (  0%)
> 2016-03-11 22             0 (  0%)          0 (  0%)
> 2016-03-11 23             0 (  0%)          0 (  0%)
> 
> 
> Done. Report generated in 1 sec by sa-stats.pl, version 6256.
>
>       On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
> 
> I would like to know how to get these stats too.
> 
> From: Robert Chalmers [mailto:robert@chalmers.com.au] 
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
> 
> Can I ask, how are you getting these stats please?
> 
> Thanks
>       On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
> 
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
>
>       I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve
>       seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
>       Here’s some rule stats for anyone interested:
>
>       TOP SPAM RULES FIRED
>
>       RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>        1         TXREP                       13171   8.47   40.38  91.00  72.91
>        2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>        3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>        4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>        5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>        6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>        7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>        8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>        9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>       10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>
>       TOP HAM RULES FIRED
>
>       RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>        1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>        2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>        3         TXREP                       13228   7.33   40.56  91.00  72.91
>        4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>        5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>        6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>        7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>        8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>        9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>       10        BAYES_50                             6201    3.44   19.01  25.56  34.18
> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired
> stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
> 
> Robert Chalmers
> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
> 
> 
> Robert Chalmers
> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
> 
> 
> 
> 
> 
>

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
The sa-stats.pl I refer to is here.

https://spamassassin.apache.org/full/3.0.x/dist/tools/sa-stats.pl. It’s not the same as the ones shown in other posts. I don’t know what that is.

and has an output like this.

zeus:~ robert$ perl sa-stats.pl
Report Title     : SpamAssassin - Spam Statistics
Report Date      : 2016-03-11
Period Beginning : Fri 11 Mar 00:00:00 2016
Period Ending    : Sat 12 Mar 00:00:00 2016

Reporting Period : 24.00 hrs
--------------------------------------------------

Note: 'ham' = 'nonspam'

Total spam detected    :       22 (  51.16%)
Total ham accepted     :       21 (  48.84%)
                        -------------------
Total emails processed :       43 (    2/hr)

Average spam threshold :        3.00
Average spam score     :        4.46
Average ham score      :       -2.10

Spam kbytes processed  :      397   (   17 kb/hr)
Ham kbytes processed   :      147   (    6 kb/hr)
Total kbytes processed :      545   (   23 kb/hr)

Spam analysis time     :      339 s (   14 s/hr)
Ham analysis time      :      366 s (   15 s/hr)
Total analysis time    :      706 s (   29 s/hr)


Statistics by Hour
----------------------------------------------------
Hour                          Spam               Ham
-------------    -----------------    --------------
2016-03-11 00             0 (  0%)         13 (100%)
2016-03-11 01             0 (  0%)          0 (  0%)
2016-03-11 02             2 (100%)          0 (  0%)
2016-03-11 03             4 (100%)          0 (  0%)
2016-03-11 04             4 ( 57%)          3 ( 42%)
2016-03-11 05             6 ( 75%)          2 ( 25%)
2016-03-11 06             6 (100%)          0 (  0%)
2016-03-11 07             0 (  0%)          3 (100%)
2016-03-11 08             0 (  0%)          0 (  0%)
2016-03-11 09             0 (  0%)          0 (  0%)
2016-03-11 10             0 (  0%)          0 (  0%)
2016-03-11 11             0 (  0%)          0 (  0%)
2016-03-11 12             0 (  0%)          0 (  0%)
2016-03-11 13             0 (  0%)          0 (  0%)
2016-03-11 14             0 (  0%)          0 (  0%)
2016-03-11 15             0 (  0%)          0 (  0%)
2016-03-11 16             0 (  0%)          0 (  0%)
2016-03-11 17             0 (  0%)          0 (  0%)
2016-03-11 18             0 (  0%)          0 (  0%)
2016-03-11 19             0 (  0%)          0 (  0%)
2016-03-11 20             0 (  0%)          0 (  0%)
2016-03-11 21             0 (  0%)          0 (  0%)
2016-03-11 22             0 (  0%)          0 (  0%)
2016-03-11 23             0 (  0%)          0 (  0%)


Done. Report generated in 1 sec by sa-stats.pl, version 6256.

> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
> 
> I would like to know how to get these stats too.
>  
> From: Robert Chalmers [mailto:robert@chalmers.com.au] 
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>  
> Can I ask, how are you getting these stats please?
>  
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>  
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
> 
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
> 
> Here’s some rule stats for anyone interested:
> 
> TOP SPAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         TXREP                       13171   8.47   40.38  91.00  72.91
>  2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>  3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>  4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>  5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>  6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>  7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>  8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>  9         BAYES_99                             8486    5.46   26.02  58.63   1.18
> 10        BAYES_999                           8141    5.24   24.96  56.25   1.06
> 
> TOP HAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>  2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>  3         TXREP                       13228   7.33   40.56  91.00  72.91
>  4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>  5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>  6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>  7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>  8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>  9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
> 10        BAYES_50                             6201    3.44   19.01  25.56  34.18
> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>  
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
sa-stats.pl
Sometimes part of the spamassassin package. You may have to search for it on your system, otherwise, it’s available via CPAN




> On 10 Mar 2016, at 21:38, Erickarlo Porro <ep...@earthcam.com> wrote:
> 
> I would like to know how to get these stats too.
>  
> From: Robert Chalmers [mailto:robert@chalmers.com.au] 
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
>  
> Can I ask, how are you getting these stats please?
>  
> Thanks
> On 8 Mar 2016, at 05:11, David B Funk <dbfunk@engineering.uiowa.edu <ma...@engineering.uiowa.edu>> wrote:
>  
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
> 
> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
> 
> Here’s some rule stats for anyone interested:
> 
> TOP SPAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         TXREP                       13171   8.47   40.38  91.00  72.91
>  2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>  3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>  4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>  5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>  6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>  7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>  8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>  9         BAYES_99                             8486    5.46   26.02  58.63   1.18
> 10        BAYES_999                           8141    5.24   24.96  56.25   1.06
> 
> TOP HAM RULES FIRED
> 
> RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>  1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>  2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>  3         TXREP                       13228   7.33   40.56  91.00  72.91
>  4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>  5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>  6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>  7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>  8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>  9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
> 10        BAYES_50                             6201    3.44   19.01  25.56  34.18
> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu <http://engineering.uiowa.edu/>>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>  
> Robert Chalmers
> robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov <http://tinyurl.com/lwwddov>
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: sa-stats log analyzer (RE: Missed spam, suggestions?)

Posted by "robert@chalmers.com.au" <ro...@chalmers.com.au>.
The rulesemporium site appears to be down. 
If anyone has a newer version, it might be good to post it somewhere? My site for eg?

Robert


Sent from my iPad

> On 11 Mar 2016, at 04:17, David B Funk <db...@engineering.uiowa.edu> wrote:
> 
> That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
> You feed it a segment of your spamd logs and it gives you
> those rule hit statistics.
> 
> See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers
> 
> Looking at that wiki page, I noticed that the copy available is v0.93.
> I've got v1.03
> Does anybody know what was the newest one last avaialable on the rulesemporium site? Anbody got something newer than v1.03?
> 
> I've done a bit of hacking to my copy (such as adding the S/O ratio stats).
> 
> 
>> On Thu, 10 Mar 2016, Erickarlo Porro wrote:
>> 
>> I would like to know how to get these stats too.
>>  
>> From: Robert Chalmers [mailto:robert@chalmers.com.au]
>> Sent: Tuesday, March 08, 2016 5:25 AM
>> To: users@spamassassin.apache.org
>> Subject: Re: Missed spam, suggestions?
>>  
>> Can I ask, how are you getting these stats please?
>>  
>> Thanks
>> 
>>      On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
>>  
>> On Mon, 7 Mar 2016, Charles Sprickman wrote:
>> 
>>      I’ve been running with some daily training for a little over a week and I’m seeing less spam in my
>>      inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these
>>      were two phishing emails.
>> 
>>      Here’s some rule stats for anyone interested:
>> 
>>      TOP SPAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         TXREP                       13171   8.47   40.38  91.00  72.91
>>       2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>>       3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>>       4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>>       5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>>       6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>>       7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>>       8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>>       9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>>      10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>> 
>>      TOP HAM RULES FIRED
>> 
>>      RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>       1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>>       2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>>       3         TXREP                       13228   7.33   40.56  91.00  72.91
>>       4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>>       5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>>       6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>>       7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>>       8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>>       9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>>      10        BAYES_50                             6201    3.44   19.01  25.56  34.18
>> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the
>> rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
>> (of course if you've only been training for a week that would explain it).
>> For example, here's my top-10 hits (for a one month interval).
>> TOP SPAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
>> OP HAM RULES FIRED
>> ----------------------------------------------------------------------
>> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
>> ----------------------------------------------------------------------
>>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
>> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
>> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
>> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can
>> look shakey).
>> --
>> Dave Funk                                  University of Iowa
>> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
>> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
>> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
>> #include <std_disclaimer.h>
>> Better is not better, 'standard' is better. B{
>>  
>> Robert Chalmers
>> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
>> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
>> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
>>  
>>  
>>  
>> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{

sa-stats log analyzer (RE: Missed spam, suggestions?)

Posted by David B Funk <db...@engineering.uiowa.edu>.
That's the output from Dallas Engelken's "sa-stats.pl" log analyzer.
You feed it a segment of your spamd logs and it gives you
those rule hit statistics.

See: http://wiki.apache.org/spamassassin/StatsAndAnalyzers

Looking at that wiki page, I noticed that the copy available is v0.93.
I've got v1.03
Does anybody know what was the newest one last avaialable on the rulesemporium 
site? Anbody got something newer than v1.03?

I've done a bit of hacking to my copy (such as adding the S/O ratio stats).


On Thu, 10 Mar 2016, Erickarlo Porro wrote:

> 
> I would like to know how to get these stats too.
> 
>  
> 
> From: Robert Chalmers [mailto:robert@chalmers.com.au]
> Sent: Tuesday, March 08, 2016 5:25 AM
> To: users@spamassassin.apache.org
> Subject: Re: Missed spam, suggestions?
> 
>  
> 
> Can I ask, how are you getting these stats please?
> 
>  
> 
> Thanks
>
>       On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
> 
>  
> 
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
>
>       I’ve been running with some daily training for a little over a week and I’m seeing less spam in my
>       inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these
>       were two phishing emails.
>
>       Here’s some rule stats for anyone interested:
>
>       TOP SPAM RULES FIRED
>
>       RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>        1         TXREP                       13171   8.47   40.38  91.00  72.91
>        2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
>        3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
>        4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
>        5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
>        6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
>        7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
>        8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
>        9         BAYES_99                             8486    5.46   26.02  58.63   1.18
>       10        BAYES_999                           8141    5.24   24.96  56.25   1.06
>
>       TOP HAM RULES FIRED
>
>       RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>        1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
>        2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
>        3         TXREP                       13228   7.33   40.56  91.00  72.91
>        4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
>        5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
>        6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
>        7         BAYES_00                             8390    4.65   25.72   1.84   46.24
>        8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
>        9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
>       10        BAYES_50                             6201    3.44   19.01  25.56  34.18
> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the
> rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can
> look shakey).
> 
> 
> --
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
> 
>  
> 
> Robert Chalmers
> 
> robert@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
> 
> Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
> 
> 2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay
> 
>  
> 
>  
> 
>  
> 
> 
>

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

RE: Missed spam, suggestions?

Posted by Erickarlo Porro <ep...@earthcam.com>.
I would like to know how to get these stats too.

From: Robert Chalmers [mailto:robert@chalmers.com.au]
Sent: Tuesday, March 08, 2016 5:25 AM
To: users@spamassassin.apache.org
Subject: Re: Missed spam, suggestions?

Can I ask, how are you getting these stats please?

Thanks
On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu>> wrote:

On Mon, 7 Mar 2016, Charles Sprickman wrote:


I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.

Here’s some rule stats for anyone interested:

TOP SPAM RULES FIRED

RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

 1         TXREP                       13171   8.47   40.38  91.00  72.91
 2         HTML_MESSAGE                12714   8.18   38.98  87.85  90.80
 3         DCC_CHECK                        10593   6.81   32.48  73.19  33.78
 4         RDNS_NONE                        10269   6.60   31.48  70.95   5.63
 5         SPF_HELO_PASS                 10070   6.48   30.87  69.58  23.41
 6         URIBL_BLACK                    9711    6.25   29.77  67.10   1.58
 7         BODY_NEWDOMAIN_FMBLA                9550    6.14   29.28   65.98   1.64
 8         FROM_NEWDOMAIN_FMBLA                9483    6.10   29.07   65.52   1.36
 9         BAYES_99                             8486    5.46   26.02  58.63   1.18
10        BAYES_999                           8141    5.24   24.96  56.25   1.06

TOP HAM RULES FIRED

RANK RULE NAME                        COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

 1         HTML_MESSAGE                16473   9.13   50.51  87.85  90.80
 2         DKIM_SIGNED                    13776   7.64   42.24  13.81  75.93
 3         TXREP                       13228   7.33   40.56  91.00  72.91
 4         DKIM_VALID                      12962   7.19   39.74  11.93  71.44
 5         RCVD_IN_DNSWL_NONE            9941    5.51   30.48   8.08            54.79
 6         DKIM_VALID_AU              8711    4.83   26.71   7.99   48.01
 7         BAYES_00                             8390    4.65   25.72   1.84   46.24
 8         RCVD_IN_JMF_W               7369    4.09   22.59   2.54   40.62
 9         RCVD_IN_MSPIKE_WL                 6713    3.72   20.58   4.39            37.00
10        BAYES_50                             6201    3.44   19.01  25.56  34.18

Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).

For example, here's my top-10 hits (for a one month interval).

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
  1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
  2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
  3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
  4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
  5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
  6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
  7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
  8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
  9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
 10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135

OP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
  1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
  2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
  3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
  4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
  5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
  6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
  7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
  8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
  9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
 10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331

Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).

BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).


--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu<http://engineering.uiowa.edu>>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Robert Chalmers
robert@chalmers.com<ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay




Re: Missed spam, suggestions?

Posted by Robert Chalmers <ro...@chalmers.com.au>.
Can I ask, how are you getting these stats please?

Thanks
> On 8 Mar 2016, at 05:11, David B Funk <db...@engineering.uiowa.edu> wrote:
> 
> On Mon, 7 Mar 2016, Charles Sprickman wrote:
> 
>> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>> 
>> Here’s some rule stats for anyone interested:
>> 
>> TOP SPAM RULES FIRED
>> 
>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>  1	TXREP                    	13171	  8.47	 40.38	 91.00	 72.91
>>  2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80
>>  3	DCC_CHECK                	10593	  6.81	 32.48	 73.19	 33.78
>>  4	RDNS_NONE                	10269	  6.60	 31.48	 70.95	  5.63
>>  5	SPF_HELO_PASS            	10070	  6.48	 30.87	 69.58	 23.41
>>  6	URIBL_BLACK              	 9711	  6.25	 29.77	 67.10	  1.58
>>  7	BODY_NEWDOMAIN_FMBLA     	 9550	  6.14	 29.28	 65.98	  1.64
>>  8	FROM_NEWDOMAIN_FMBLA     	 9483	  6.10	 29.07	 65.52	  1.36
>>  9	BAYES_99                 	 8486	  5.46	 26.02	 58.63	  1.18
>> 10	BAYES_999                	 8141	  5.24	 24.96	 56.25	  1.06
>> 
>> TOP HAM RULES FIRED
>> 
>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>> 
>>  1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80
>>  2	DKIM_SIGNED              	13776	  7.64	 42.24	 13.81	 75.93
>>  3	TXREP                    	13228	  7.33	 40.56	 91.00	 72.91
>>  4	DKIM_VALID               	12962	  7.19	 39.74	 11.93	 71.44
>>  5	RCVD_IN_DNSWL_NONE       	 9941	  5.51	 30.48	  8.08	 54.79
>>  6	DKIM_VALID_AU            	 8711	  4.83	 26.71	  7.99	 48.01
>>  7	BAYES_00                 	 8390	  4.65	 25.72	  1.84	 46.24
>>  8	RCVD_IN_JMF_W            	 7369	  4.09	 22.59	  2.54	 40.62
>>  9	RCVD_IN_MSPIKE_WL        	 6713	  3.72	 20.58	  4.39	 37.00
>> 10	BAYES_50                 	 6201	  3.44	 19.01	 25.56	 34.18
>> 
> 
> Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
> (of course if you've only been training for a week that would explain it).
> 
> For example, here's my top-10 hits (for a one month interval).
> 
> TOP SPAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
>   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
>   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
>   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
>   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
>   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
>   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
>   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
>   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
>  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135
> 
> OP HAM RULES FIRED
> ----------------------------------------------------------------------
> RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
> ----------------------------------------------------------------------
>   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
>   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
>   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
>   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
>   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
>   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
>   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
>   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
>   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
>  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331
> 
> Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).
> 
> BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
> hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).
> 
> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{

Robert Chalmers
robert@chalmers.com <ma...@chalmers.com>.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. Lower Bay





Re: Missed spam, suggestions?

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Mon, 7 Mar 2016, Charles Sprickman wrote:

> I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.
>
> Here’s some rule stats for anyone interested:
>
> TOP SPAM RULES FIRED
>
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>   1	TXREP                    	13171	  8.47	 40.38	 91.00	 72.91
>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80
>   3	DCC_CHECK                	10593	  6.81	 32.48	 73.19	 33.78
>   4	RDNS_NONE                	10269	  6.60	 31.48	 70.95	  5.63
>   5	SPF_HELO_PASS            	10070	  6.48	 30.87	 69.58	 23.41
>   6	URIBL_BLACK              	 9711	  6.25	 29.77	 67.10	  1.58
>   7	BODY_NEWDOMAIN_FMBLA     	 9550	  6.14	 29.28	 65.98	  1.64
>   8	FROM_NEWDOMAIN_FMBLA     	 9483	  6.10	 29.07	 65.52	  1.36
>   9	BAYES_99                 	 8486	  5.46	 26.02	 58.63	  1.18
>  10	BAYES_999                	 8141	  5.24	 24.96	 56.25	  1.06
>
> TOP HAM RULES FIRED
>
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80
>   2	DKIM_SIGNED              	13776	  7.64	 42.24	 13.81	 75.93
>   3	TXREP                    	13228	  7.33	 40.56	 91.00	 72.91
>   4	DKIM_VALID               	12962	  7.19	 39.74	 11.93	 71.44
>   5	RCVD_IN_DNSWL_NONE       	 9941	  5.51	 30.48	  8.08	 54.79
>   6	DKIM_VALID_AU            	 8711	  4.83	 26.71	  7.99	 48.01
>   7	BAYES_00                 	 8390	  4.65	 25.72	  1.84	 46.24
>   8	RCVD_IN_JMF_W            	 7369	  4.09	 22.59	  2.54	 40.62
>   9	RCVD_IN_MSPIKE_WL        	 6713	  3.72	 20.58	  4.39	 37.00
>  10	BAYES_50                 	 6201	  3.44	 19.01	 25.56	 34.18
>

Based upon your stats it looks like you need more Bayes training. 
Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 
shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).

For example, here's my top-10 hits (for a one month interval).

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
    1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
    2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
    3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
    4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
    5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
    6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
    7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
    8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
    9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
   10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135

OP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
    1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
    2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
    3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
    4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
    5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
    6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
    7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
    8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
    9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
   10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331

Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way 
down in the mud (below 50 rank).

BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or 
conference announcments that can look shakey).


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Missed spam, suggestions?

Posted by John Hardin <jh...@impsec.org>.
On Tue, 8 Mar 2016, Matus UHLAR - fantomas wrote:

>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> 
>> wrote:
>> >  how can these two stats be different?
>
> On 08.03.16 10:19, @lbutlr wrote:
>> Because one is for SPAM and one is for HAM.
>
> TOP SPAM RULES FIRED
>
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM 
> %OFHAM
>
>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85 
> 90.80
>
> TOP HAM RULES FIRED
>
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM 
> %OFHAM
>
>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85 
> 90.80
>
>
> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?

Speculation: 38.98 %OFMAIL = %OFSPAM * %SPAM, not %TOTAL
so: HTML_MESSAGE hit 87.85% of spam, and *that* was 39.98% of total 
messages processed.

?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Failure to plan ahead on someone else's part does not constitute
   an emergency on my part.                 -- David W. Barts in a.s.r
-----------------------------------------------------------------------
  5 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: Missed spam, suggestions?

Posted by Benny Pedersen <me...@junc.eu>.
On 8. mar. 2016 18.42.03 Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:

> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?

grep foo ./hamfolder
grep bar ./spamfolder

Why should both folders need same counts of mails ?

Re: Missed spam, suggestions?

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Tue, 8 Mar 2016, Matus UHLAR - fantomas wrote:

>>>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> 
>>>> wrote:
>>>>> how can these two stats be different?
>
>>> On 08.03.16 10:19, @lbutlr wrote:
>>>> Because one is for SPAM and one is for HAM.
>
>>> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> 
>>> wrote:
>>> Why did you remove the important part?
>
> On 08.03.16 11:16, @lbutlr wrote:
>> I didn’t.
>
> yes, you did, so I've had to paste them again below:
>
>>> TOP SPAM RULES FIRED
>>> 
>>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM 
>>> %OFHAM
>>>
>>>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85 
>>> 90.80
>>> 
>>> TOP HAM RULES FIRED
>>> 
>>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM 
>>> %OFHAM
>>>
>>>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85 
>>> 90.80
>>> 
>>> 
>>> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
>> 
>> Because on is checking SPAM and on is checking HAM.
>
> so why was %OFMAIL different from %OFSPAM in the first case and from %OFHAM
> in the second case?
>
>>> seems that the mail counts were different, but why?
>> 
>> Because there are differing amounts of SPAM and HAM?
>
> if we are only checking spam mail for a given rule, how can be number of
> all hits different than number of spam hits? they all should be spam,
> shouldn't they?

Assuming that the OP was using Dallas Engelken's "sa-stats.pl" script
(I was) then the report line for each rule (excepting the first column)
should be IDENTICAL.

This script takes as input a spamd's log output. It then aggregates a digest
of all the rule hits. In a given log report there will be lines that are
spam results ("spamd: result: Y 75") and lines that are ham results ("spamd: result: . -3").
For each line (spam & ham) there will be a list of the rules that fired on that 
particular message:

2016-03-08T12:37:44.833847-06:00 s-l107 spamd[10463]: spamd: result: . -3 - 
BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,KHOP_RCVD_TRUST,L_LOCAL_MUCHO_DOT_LINES2,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE,RP_MATCHES_RCVD,SPF_PASS,T__RECEIVED_1 
scantime=3.5,size=11059,user=redacted,uid=115,required_score=6.0,rhost=s-l012.engr.uiowa.edu,raddr=128.255.17.253,rport=35620,mid=<re...@email.amazonses.com>,bayes=0.000000,autolearn=ham 
autolearn_force=no

So for the HTML_MESSAGE rule, I get stats of:
grep HTML_MESSAGE sa-stats-dec.out
    4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
    2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456

This means that of all the messages processed (for the duration of that log run) 
that rule hit %79.41 of all messages processed, %68.63 of the lines classifed as 
spam (a count of 90850 and resulting in a  rank of 4) and %86.59 of the lines 
classifed as ham (a count of 171992 resulting in a rank of 2).

Thus for a given rule, the %all-messages, %spam %ham should be IDENTICAL.
(assuming they are from the same log run).

So for the OP's original post, having %spam %ham be identical but %all-messages 
being different is weird. Now it could be that he's got a different version of
the sa-stats script, it has an addtional field, that "%of-rules" thing.

So to Charles Sprickman, which sa-stats script did you use to generate your 
rules report?


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Missed spam, suggestions?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>>> how can these two stats be different?

>> On 08.03.16 10:19, @lbutlr wrote:
>>> Because one is for SPAM and one is for HAM.

>> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>> Why did you remove the important part?

On 08.03.16 11:16, @lbutlr wrote:
>I didn’t.

yes, you did, so I've had to paste them again below:

>> TOP SPAM RULES FIRED
>>
>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>>
>>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80
>>
>> TOP HAM RULES FIRED
>>
>> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>>
>>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80
>>
>>
>> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?
>
>Because on is checking SPAM and on is checking HAM.

so why was %OFMAIL different from %OFSPAM in the first case and from %OFHAM
in the second case?

>> seems that the mail counts were different, but why?
>
>Because there are differing amounts of SPAM and HAM?

if we are only checking spam mail for a given rule, how can be number of
all hits different than number of spam hits? they all should be spam,
shouldn't they?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
On the other hand, you have different fingers. 

Re: Missed spam, suggestions?

Posted by "@lbutlr" <kr...@kreme.com>.
> On Mar 8, 2016, at 10:41 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> 
>> On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>>> how can these two stats be different?
> 
> On 08.03.16 10:19, @lbutlr wrote:
>> Because one is for SPAM and one is for HAM.
> 
> Why did you remove the important part?

I didn’t.

> TOP SPAM RULES FIRED
> 
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80
> 
> TOP HAM RULES FIRED
> 
> RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
> 
>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80
> 
> 
> Why did the same rule hit 38.98% of all mail and 50.51% of all mail?

Because on is checking SPAM and on is checking HAM.

> seems that the mail counts were different, but why?

Because there are differing amounts of SPAM and HAM?


-- 
"Rosa sat, so Martin could walk. Martin walked, so Obama could run.
Obama ran, so our children can fly." (paraphrased from NPR)


Re: Missed spam, suggestions?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
>> how can these two stats be different?

On 08.03.16 10:19, @lbutlr wrote:
>Because one is for SPAM and one is for HAM.

Why did you remove the important part?

TOP SPAM RULES FIRED

RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

    2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80

TOP HAM RULES FIRED

RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

    1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80


Why did the same rule hit 38.98% of all mail and 50.51% of all mail?

seems that the mail counts were different, but why?
did Charles generate stats at that very different times? 

comparing results from the same set would be much better...

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
A day without sunshine is like, night.

Re: Missed spam, suggestions?

Posted by "@lbutlr" <kr...@kreme.com>.
On Mar 8, 2016, at 7:31 AM, Matus UHLAR - fantomas <uh...@fantomas.sk> wrote:
> how can these two stats be different?

Because one is for SPAM and one is for HAM.

-- 
No man is free who is not master of himself


Re: Missed spam, suggestions?

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 07.03.16 23:39, Charles Sprickman wrote:
>TOP SPAM RULES FIRED
>
>RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80

>TOP HAM RULES FIRED
>
>RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
>
>   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80

how can these two stats be different?



-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"They say when you play that M$ CD backward you can hear satanic messages."
"That's nothing. If you play it forward it will install Windows."

Re: Missed spam, suggestions?

Posted by Charles Sprickman <sp...@bway.net>.
> On Feb 29, 2016, at 3:18 PM, Reindl Harald <h....@thelounge.net> wrote:
> 
> Am 29.02.2016 um 21:05 schrieb Charles Sprickman:
>>> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
>>> 
>>> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>>>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
>>> 
>>> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
>>> 
>> 
>> I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.
> 
> and i know that it don't matter
> 
> nothing easier then rsync the bayes-folder to several machines at the end of the learning script, we even share the side-wide bayes over webservices to external entities and so it coves around 5000 users at the moment in summary

I’m not seeing much of a change in load after enabling this with a global user and no autolearn.  I think the db was really only constrained on the inserts/updates.

> 
>> I’ll try that today and see how the load looks.  My concern with disabling autolearn is that then I’m the only one training.  My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.
> 
> you should be the only one who trains in most cases for several reasons
> 
> * few to zero users train anough ham and spam for a proper bayes
> * wrong classified autolearn takes a wrong direction sooner or later
> 
> given that we now for more than a year maintain a side-wide bayes for inbound MX re-used on submission servers to minimize the impact of hacked accounts and it works so much better than all the "user bayes" solutions the last decade it's the way to go if you *really* want proper operations

I’ve been running with some daily training for a little over a week and I’m seeing less spam in my inbox.  I’ve seen a few things slip through because bayes tipped them below the default score, these were two phishing emails.

Here’s some rule stats for anyone interested:

TOP SPAM RULES FIRED

RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

   1	TXREP                    	13171	  8.47	 40.38	 91.00	 72.91
   2	HTML_MESSAGE             	12714	  8.18	 38.98	 87.85	 90.80
   3	DCC_CHECK                	10593	  6.81	 32.48	 73.19	 33.78
   4	RDNS_NONE                	10269	  6.60	 31.48	 70.95	  5.63
   5	SPF_HELO_PASS            	10070	  6.48	 30.87	 69.58	 23.41
   6	URIBL_BLACK              	 9711	  6.25	 29.77	 67.10	  1.58
   7	BODY_NEWDOMAIN_FMBLA     	 9550	  6.14	 29.28	 65.98	  1.64
   8	FROM_NEWDOMAIN_FMBLA     	 9483	  6.10	 29.07	 65.52	  1.36
   9	BAYES_99                 	 8486	  5.46	 26.02	 58.63	  1.18
  10	BAYES_999                	 8141	  5.24	 24.96	 56.25	  1.06

TOP HAM RULES FIRED

RANK	RULE NAME                	COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

   1	HTML_MESSAGE             	16473	  9.13	 50.51	 87.85	 90.80
   2	DKIM_SIGNED              	13776	  7.64	 42.24	 13.81	 75.93
   3	TXREP                    	13228	  7.33	 40.56	 91.00	 72.91
   4	DKIM_VALID               	12962	  7.19	 39.74	 11.93	 71.44
   5	RCVD_IN_DNSWL_NONE       	 9941	  5.51	 30.48	  8.08	 54.79
   6	DKIM_VALID_AU            	 8711	  4.83	 26.71	  7.99	 48.01
   7	BAYES_00                 	 8390	  4.65	 25.72	  1.84	 46.24
   8	RCVD_IN_JMF_W            	 7369	  4.09	 22.59	  2.54	 40.62
   9	RCVD_IN_MSPIKE_WL        	 6713	  3.72	 20.58	  4.39	 37.00
  10	BAYES_50                 	 6201	  3.44	 19.01	 25.56	 34.18

Charles



Re: Missed spam, suggestions?

Posted by Reindl Harald <h....@thelounge.net>.

Am 29.02.2016 um 21:05 schrieb Charles Sprickman:
>> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
>>
>> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
>>
>> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
>>
>
> I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.

and i know that it don't matter

nothing easier then rsync the bayes-folder to several machines at the 
end of the learning script, we even share the side-wide bayes over 
webservices to external entities and so it coves around 5000 users at 
the moment in summary

> I’ll try that today and see how the load looks.  My concern with disabling autolearn is that then I’m the only one training.  My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.

you should be the only one who trains in most cases for several reasons

* few to zero users train anough ham and spam for a proper bayes
* wrong classified autolearn takes a wrong direction sooner or later

given that we now for more than a year maintain a side-wide bayes for 
inbound MX re-used on submission servers to minimize the impact of 
hacked accounts and it works so much better than all the "user bayes" 
solutions the last decade it's the way to go if you *really* want proper 
operations


Re: Missed spam, suggestions?

Posted by Charles Sprickman <sp...@bway.net>.
> On Feb 29, 2016, at 4:23 AM, Reindl Harald <h....@thelounge.net> wrote:
> 
> 
> 
> Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
>> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates
> 
> just make a sitewide bayes (https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without autolearn / autoexpire and the default database in a folder read-only for the daemon
> 

I think I still have to stick with a db-backed option since I need to keep two SA servers in sync.

I’ll try that today and see how the load looks.  My concern with disabling autolearn is that then I’m the only one training.  My spam probably looks like everyone else’s, but my ham is very different, lots list traffic and such.

> a filter without bayes is worthless

It seems so. :)

Thanks,

Charles
--
Charles Sprickman
NetEng/SysAdmin
Bway.net - New York's Best Internet www.bway.net
spork@bway.net - 212.982.9800


> 
> 0      61323    SPAM
> 0      21811    HAM
> 0    2547152    TOKEN
> 
> insgesamt 73M
> -rw------- 1 sa-milt sa-milt 10M 2016-02-29 00:21 bayes_seen
> -rw------- 1 sa-milt sa-milt 81M 2016-02-29 00:21 bayes_toks
> 
> BAYES_00        29161   73.70 %
> BAYES_05          764    1.93 %
> BAYES_20          931    2.35 %
> BAYES_40          815    2.05 %
> BAYES_50         2909    7.35 %
> BAYES_60          424    1.07 %     8.14 % (OF TOTAL BLOCKED)
> BAYES_80          337    0.85 %     6.47 % (OF TOTAL BLOCKED)
> BAYES_95          306    0.77 %     5.87 % (OF TOTAL BLOCKED)
> BAYES_99         3918    9.90 %    75.25 % (OF TOTAL BLOCKED)
> BAYES_999        3491    8.82 %    67.05 % (OF TOTAL BLOCKED)
> 
> DNSWL           53551   91.16 %
> SPF             38530   65.59 %
> SPF/DKIM WL     16750   28.51 %
> SHORTCIRCUIT    19112   32.53 %
> 
> BLOCKED          5206    8.86 %
> SPAMMY           4985    8.48 %    95.75 % (OF TOTAL BLOCKED)
> 


Re: Missed spam, suggestions?

Posted by Reindl Harald <h....@thelounge.net>.

Am 29.02.2016 um 06:24 schrieb Charles Sprickman:
> I’ve not had much luck with Bayes - when I had it enabled recently on a per-user basis it was just hitting the master DB server too hard with udpates

just make a sitewide bayes 
(https://wiki.apache.org/spamassassin/SiteWideBayesSetup) without 
autolearn / autoexpire and the default database in a folder read-only 
for the daemon

a filter without bayes is worthless

0      61323    SPAM
0      21811    HAM
0    2547152    TOKEN

insgesamt 73M
-rw------- 1 sa-milt sa-milt 10M 2016-02-29 00:21 bayes_seen
-rw------- 1 sa-milt sa-milt 81M 2016-02-29 00:21 bayes_toks

BAYES_00        29161   73.70 %
BAYES_05          764    1.93 %
BAYES_20          931    2.35 %
BAYES_40          815    2.05 %
BAYES_50         2909    7.35 %
BAYES_60          424    1.07 %     8.14 % (OF TOTAL BLOCKED)
BAYES_80          337    0.85 %     6.47 % (OF TOTAL BLOCKED)
BAYES_95          306    0.77 %     5.87 % (OF TOTAL BLOCKED)
BAYES_99         3918    9.90 %    75.25 % (OF TOTAL BLOCKED)
BAYES_999        3491    8.82 %    67.05 % (OF TOTAL BLOCKED)

DNSWL           53551   91.16 %
SPF             38530   65.59 %
SPF/DKIM WL     16750   28.51 %
SHORTCIRCUIT    19112   32.53 %

BLOCKED          5206    8.86 %
SPAMMY           4985    8.48 %    95.75 % (OF TOTAL BLOCKED)


Re: Missed spam, suggestions?

Posted by Tom Hendrikx <to...@whyscream.net>.

On 29-02-16 06:24, Charles Sprickman wrote:
> Hi all,
> 
> Recently I occasionally get bursts of spam that slips through Postfix
> (postscreen BL checks, protocol checks) and SpamAssassin.  I just had
> another big jump in the last week.  This was mostly spam touting Oil
> Changes, SUV sales and Lawyer Finders.
> 
> What I just did was go through a collection of missed spam and re-ran
> it through spamassassin. All of it jumped from originally scoring
> around 2-3 to a minimum of 6.5 with most hitting around 12.  The
> biggest difference I see is that DNSBL and URIBL services had started
> hitting. When originally received, these emails all originated from
> very clean IPs.
> 
> I have TXREP enabled as well, but that doesn’t seem to be having
> either a positive or negative impact.
> 
> What are my options to try to catch this junk before it hits the
> various *BLs?
> 
> I’ve not had much luck with Bayes - when I had it enabled recently on
> a per-user basis it was just hitting the master DB server too hard
> with udpates.  I’m considering enabling it again with a shared db for
> all users, which I hope might work better.  It would only be auto
> trained, perhaps with some manual training by me.
> 
> Here’s a few samples, hosted elsewhere so as not to trip anyone’s
> filters:
> 
> https://gist.github.com/anonymous/0fcaf481875959c9151f (2.7 on
> Friday, 14 tonight)
> 
> https://gist.github.com/anonymous/a5396f68699392808988 (3.4 earlier
> tonight, 6.5 just now)
> 
> I have more samples, I can dig them up if that’s helpful.
> 
> Sometimes I wonder how much this has to do with the age of our domain
> and the fact that it begins with “b”. :)
> 
> The only thing I’ve been contemplating is a local spamtrap and DNSBL.
> We have a site that’s regularly trawled for email addresses, so
> seeding it should not be too difficult…
> 

Hi,

You want to give the RBLs a bit more time to kick in, you could consider
greylisting (or postscreen after-220 checks which also cause a delay and
a retry).

Regards,
	Tom