You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2010/04/19 08:29:48 UTC

Top Ten Rules

Hi all,

Thought I would share what my top ten rules are for the past few days,
and see if they compare with the consensus on the list:

BAYES_99                           76.5%
HTML_MESSAGE                       76.0%
RAZOR2_CHECK                       72.8%
RAZOR2_CF_RANGE_51_100             69.7%
RAZOR2_CF_RANGE_E8_51_100          59.9%
RELAYCOUNTRY_US                    59.4%
SPF_PASS                           58.6%
SPF_HELO_PASS                      58.6%
URIBL_BLACK                        57.4%
MIME_HTML_ONLY                     51.5%
SEM_URIRED                         48.2%
RCVD_IN_SEMBLACK                   36.3%
URIBL_JP_SURBL                     35.5%
RDNS_NONE                          34.6%

I know it's subjective based on site email, but if there was some
large disparity perhaps it would highlight where a problem might be
(for me or someone else...)

Thanks,
Alex

Re: Top Ten Rules

Posted by Per Jessen <pe...@computer.org>.
Alex wrote:

> Hi all,
> 
> Thought I would share what my top ten rules are for the past few days,
> and see if they compare with the consensus on the list:

I think you ought to count the points scored, not just the rules hit. 


/Per Jessen, Zürich


Re: Top Ten Rules

Posted by Alex <my...@gmail.com>.
Hi,

>   1    BAYES_99                         9484    46.62   97.87    0.00
>   2    BOTNET                           9024    44.38   93.13    0.05
>   3    RCVD_IN_BRBL_LASTEXT             8744    42.99   90.24    0.01
>   4    RCVD_IN_HOSTKARMA_BL             8581    42.18   88.56    0.01
>   5    RCVD_IN_XBL                      8283    40.71   85.48    0.00
>   6    RAZOR2_CHECK                     7696    37.88   79.42    0.10

Quite a difference in the top few from mine. I like the results from
the Barracuda list. I have to upgrade to v3.3 soon.

I wish I had something better than just my homegrown stuff for
generating this stuff, since I'm not using spamd. Some type of
real-time "spamtop" would be really neat.

My HOSTKARMA rules are being handled by the KHOP channel, but for some
reason I don't even see that listed on mine. This might have helped me
to locate a problem.

Thanks,
Alex

Re: Top Ten Rules

Posted by ram <ta...@gmail.com>.
On Fri, Apr 23, 2010 at 1:06 AM, Alex <my...@gmail.com> wrote:

> Hi,
>
> >> How many entries? Does it just keep growing? We have a local one too,
> >> and every so often correlate it with the public RBLs so as to not
> >> duplicate the check and overhead.
> >
> > They expire in 2 weeks. They should make it into a public RBL by
> > that time. Maybe it should even be shorter.
>
> I'm not sure that's the best approach. I can't say definitively, of
> course, but that seems very quick for them to automatically be
> expunged after two weeks.
>
> Do you have routines that query the blacklists periodically and remove
> the entries from your list based on the query result?
>
> I think that if you thought it was spam at one point, and even several
> months later it hasn't been listed on one of the public RBLs, then
> either submit it to them, or kat least keep it on your list or recheck
> it manually.
>
> Of course it depends on your workload, inherent benefit, etc...
>
> >> Sender address? Are you talking about protection from dictionary
> >> attacks, like aaa@columbia.edu, bbb@... etc?
> >
> > If the sender claims to be aaa@columbia.edu, then we can verify whether
> > the localpart aaa exists. Our own domain is the only one for which we
> > can check localpart, of course. If it does not exist, goodbye.
>
> Ah, that's a different matter. That's an easy one that we all do too.
>
> > Joseph Brennan
> > Columbia University Information Technology
>
> It would be very cool to work at Columbia :-)
>
> Regards,
> Alex
>



my stats show new server like this ( sitewide spamassassin)

is the spamassassin configured in good way. ?

or any suggestions


./sa-stats

Email:     3347  Autolearn:  1422  AvgScore:   1.44  AvgScanTime:  8.03 sec
Spam:       689  Autolearn:   287  AvgScore:  11.72  AvgScanTime:  8.16 sec
Ham:       2658  Autolearn:  1135  AvgScore:  -1.23  AvgScanTime:  8.00 sec
Time Spent Running SA:         7.47 hours
Time Spent Processing Spam:    1.56 hours
Time Spent Processing Ham:     5.90 hours
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
   1    HTML_MESSAGE                      455    69.82   66.04   70.81
   2    RAZOR2_CHECK                      409    15.72   59.36    4.40
   3    RAZOR2_CF_RANGE_51_100            389    14.40   56.46    3.50
   4    BAYES_99                          357    10.67   51.81    0.00
   5    RAZOR2_CF_RANGE_E4_51_100         259     8.25   37.59    0.64
   6    AWL                               251    67.85   36.43   76.00
   7    RAZOR2_CF_RANGE_E8_51_100         230     9.17   33.38    2.90
   8    PYZOR_CHECK                       223     7.59   32.37    1.17
   9    MIME_HTML_ONLY                    220    22.74   31.93   20.35
  10    URIBL_BLACK                       208     7.92   30.19    2.14
  11    DIGEST_MULTIPLE                   200     6.01   29.03    0.04
  12    URIBL_JP_SURBL                    172     5.32   24.96    0.23
  13    BAYES_50                          157     7.80   22.79    3.91
  14    RDNS_NONE                         148     9.59   21.48    6.51
  15    SUBJ_ALL_CAPS                     147     7.38   21.34    3.76
  16    FORGED_MUA_OUTLOOK                129     4.51   18.72    0.83
  17    MISSING_HEADERS                   129     5.08   18.72    1.54
  18    RCVD_IN_SORBS_WEB                 126     8.37   18.29    5.79
  19    URIBL_WS_SURBL                    124     3.79   18.00    0.11
  20    HTML_MIME_NO_HTML_TAG             121     7.83   17.56    5.30
----------------------------------------------------------------------
TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM
----------------------------------------------------------------------
   1    BAYES_00                         2491    75.83    6.82   93.72
   2    AWL                              2020    67.85   36.43   76.00
   3    HTML_MESSAGE                     1882    69.82   66.04   70.81
   4    SPF_HELO_PASS                     577    17.90    3.19   21.71
   5    MIME_HTML_ONLY                    541    22.74   31.93   20.35
   6    DEAR_SOMETHING                    276     9.08    4.06   10.38
   7    RCVD_IN_DNSWL_MED                 195     5.92    0.44    7.34
   8    MISSING_MID                       192     8.93   15.53    7.22
   9    RDNS_NONE                         173     9.59   21.48    6.51
  10    RCVD_IN_SORBS_WEB                 154     8.37   18.29    5.79
  11    HTML_MIME_NO_HTML_TAG             141     7.83   17.56    5.30
  12    RCVD_IN_DNSWL_LOW                 119     6.30   13.35    4.48
  13    RAZOR2_CHECK                      117    15.72   59.36    4.40
  14    MIME_QP_LONG_LINE                 110     4.06    3.77    4.14
  15    BAYES_50                          104     7.80   22.79    3.91
  16    HABEAS_ACCREDITED_SOI             100     2.99    0.00    3.76
  17    SUBJ_ALL_CAPS                     100     7.38   21.34    3.76
  18    RAZOR2_CF_RANGE_51_100             93    14.40   56.46    3.50
  19    HTML_IMAGE_RATIO_02                89     3.70    5.08    3.35
  20    RCVD_IN_BSP_TRUSTED                89     2.66    0.00    3.35
----------------------------------------------------------------------

Re: Top Ten Rules

Posted by Alex <my...@gmail.com>.
Hi,

>> How many entries? Does it just keep growing? We have a local one too,
>> and every so often correlate it with the public RBLs so as to not
>> duplicate the check and overhead.
>
> They expire in 2 weeks. They should make it into a public RBL by
> that time. Maybe it should even be shorter.

I'm not sure that's the best approach. I can't say definitively, of
course, but that seems very quick for them to automatically be
expunged after two weeks.

Do you have routines that query the blacklists periodically and remove
the entries from your list based on the query result?

I think that if you thought it was spam at one point, and even several
months later it hasn't been listed on one of the public RBLs, then
either submit it to them, or kat least keep it on your list or recheck
it manually.

Of course it depends on your workload, inherent benefit, etc...

>> Sender address? Are you talking about protection from dictionary
>> attacks, like aaa@columbia.edu, bbb@... etc?
>
> If the sender claims to be aaa@columbia.edu, then we can verify whether
> the localpart aaa exists. Our own domain is the only one for which we
> can check localpart, of course. If it does not exist, goodbye.

Ah, that's a different matter. That's an easy one that we all do too.

> Joseph Brennan
> Columbia University Information Technology

It would be very cool to work at Columbia :-)

Regards,
Alex

Re: Top Ten Rules

Posted by Joseph Brennan <br...@columbia.edu>.

>>  29,148 messages : Host sending mail was in our local blocklist
>
> How many entries? Does it just keep growing? We have a local one too,
> and every so often correlate it with the public RBLs so as to not
> duplicate the check and overhead.

They expire in 2 weeks. They should make it into a public RBL by
that time. Maybe it should even be shorter.



>>  46,037 -------- : Sender address was a user unknown @columbia.edu
>>                        done by Sendmail, local ruleset
>
> Sender address? Are you talking about protection from dictionary
> attacks, like aaa@columbia.edu, bbb@... etc?

If the sender claims to be aaa@columbia.edu, then we can verify whether
the localpart aaa exists. Our own domain is the only one for which we
can check localpart, of course. If it does not exist, goodbye.

It's not 100% spam. A few are legit users who mistyped the address in
a mail client configuration. It's worthwhile giving them an error too,
so they'll know about it.



Joseph Brennan
Columbia University Information Technology



Re: Top Ten Rules

Posted by Alex <my...@gmail.com>.
Hi,

>  29,148 messages : Host sending mail was in our local blocklist
>                        note below *

How many entries? Does it just keep growing? We have a local one too,
and every so often correlate it with the public RBLs so as to not
duplicate the check and overhead.

>  42,132 -------- : Sender domain unknown
>                        done by Sendmail, standard ruleset

These are rejected outright, correct?

>  46,037 -------- : Sender address was a user unknown @columbia.edu
>                        done by Sendmail, local ruleset

Sender address? Are you talking about protection from dictionary
attacks, like aaa@columbia.edu, bbb@... etc?

> 1,222,373 -------- : RBL, host sending mail was in Spamhaus
>                        done by Mimedefang

Do you know what the postfix equivalent would be for mimedefang?

> I know the RBLs would be done by SpamAssassin in many installations.
> We found it more efficient to knock off the low-hanging fruit first,
> and then run SpamAssassin on what's left.

Sure, I think we all take that approach -- to reject outright email
that doesn't fit our policy, such as failed helo checks, zen checks at
smtp time, and unknown (unresolvable) domains.

Thanks,
Alex

Re: Top Ten Rules

Posted by Joseph Brennan <br...@columbia.edu>.
We run some rules in Sendmail and Mimedefang that cause rejection
before Mimedefang would run the SpamAssassin library against the
messages.

In the order the rules get hit, rejection counts from yesterday:

   29,148 messages : Host sending mail was in our local blocklist
			note below *
			
   42,132 -------- : Sender domain unknown
			done by Sendmail, standard ruleset

   46,037 -------- : Sender address was a user unknown @columbia.edu
			done by Sendmail, local ruleset

1,222,373 -------- : RBL, host sending mail was in Spamhaus
			done by Mimedefang

   44,091 -------- : RBL, URI in text was in Spamhaus
			done by Mimedefang, local code

   60,178 -------- : SpamAssassin score was 8.0 or higher
			includes some local rules


I know the RBLs would be done by SpamAssassin in many installations.
We found it more efficient to knock off the low-hanging fruit first,
and then run SpamAssassin on what's left.

(* Currently done as sendmail access.db, to be converted to an RBL
to be checked by Mimedefang. Based on spam reports from our users.)



Joseph Brennan
Columbia University Information Technology



Re: Top Ten Rules

Posted by Jari Fredriksson <ja...@iki.fi>.
On 19.4.2010 9:29, Alex wrote:
> Hi all,
> 

Here is mine top 20

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK	RULE NAME               	COUNT  %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
   1	BAYES_99                	 9484	 46.62	 97.87	  0.00
   2	BOTNET                  	 9024	 44.38	 93.13	  0.05
   3	RCVD_IN_BRBL_LASTEXT    	 8744	 42.99	 90.24	  0.01
   4	RCVD_IN_HOSTKARMA_BL    	 8581	 42.18	 88.56	  0.01
   5	RCVD_IN_XBL             	 8283	 40.71	 85.48	  0.00
   6	RAZOR2_CHECK            	 7696	 37.88	 79.42	  0.10
   7	RAZOR2_CF_RANGE_51_100  	 7627	 37.52	 78.71	  0.06
   8	HTML_MESSAGE            	 7590	 50.63	 78.33	 25.45
   9	URIBL_BLACK             	 7141	 35.23	 73.69	  0.25
  10	RAZOR2_CF_RANGE_E8_51_100	 7075	 34.81	 73.01	  0.06
  11	RCVD_IN_NIX_SPAM        	 7005	 34.52	 72.29	  0.16
  12	RCVD_IN_BL_SPAMCOP_NET  	 6436	 31.77	 66.42	  0.26
  13	RCVD_IN_PSBL            	 6306	 31.00	 65.08	  0.00
  14	URIBL_DBL_SPAM          	 6152	 30.24	 63.49	  0.00
  15	MIME_HTML_ONLY          	 6025	 30.60	 62.18	  1.88
  16	T_URIBL_BLACK_OVERLAP   	 5477	 26.99	 56.52	  0.13
  17	RCVD_IN_SORBS_WEB       	 5298	 26.60	 54.67	  1.06
  18	URIBL_JP_SURBL          	 5132	 25.29	 52.96	  0.13
  19	KHOP_DNSBL_ADJ          	 5058	 24.86	 52.20	  0.00
  20	RDNS_NONE               	 4877	 24.00	 50.33	  0.06


-- 
http://www.iki.fi/jarif/

The man who sets out to carry a cat by its tail learns something that
will always be useful and which never will grow dim or doubtful.
		-- Mark Twain


Re: Top Ten Rules

Posted by Alex <my...@gmail.com>.
Hi,

>> Thought I would share what my top ten rules are for the past few days,
>> and see if they compare with the consensus on the list:
>
> I do not have such a thing. I have a cumulative count of the hits for about
> the last 4 years. These are the top ten per pure occurrence
>
> +------------------------+---------+
> | rule                   | noccur  |
> +------------------------+---------+
> | BAYES_99               |  955517 |
> | HTML_MESSAGE           |  892230 |

The format looks like it came from a database table dump? That would
be a good idea. I'd like to experiment with the SA Stats plugin for
mysql and see what type of manipulation can be done with the data.
Anyone have any experience with the Stats plugin?

Thanks,
Alex

Re: Top Ten Rules

Posted by Lucio Chiappetti <lu...@lambrate.inaf.it>.
On Mon, 19 Apr 2010, Alex wrote:

> Thought I would share what my top ten rules are for the past few days,
> and see if they compare with the consensus on the list:

I do not have such a thing. I have a cumulative count of the hits for 
about the last 4 years. These are the top ten per pure occurrence

+------------------------+---------+
| rule                   | noccur  |
+------------------------+---------+
| BAYES_99               |  955517 |
| HTML_MESSAGE           |  892230 |
| RAZOR2_CHECK           |  832501 |
| RAZOR2_CF_RANGE_51_100 |  808921 |
| DCC_CHECK              |  739857 |
| DIGEST_MULTIPLE        |  597328 |
| RCVD_IN_BL_SPAMCOP_NET |  450102 |
| URIBL_JP_SURBL         |  400931 |
| MIME_HTML_ONLY         |  349913 |
| RCVD_IN_SORBS_DUL      |  295590 |
+------------------------+---------+

These are the top ten per number of occurrence multiplied by the score

+------------------------+--------+-------+
| rule                   | noccur | score |
+------------------------+--------+-------+
| BAYES_99               | 955517 |   3.5 |
| DCC_CHECK              | 739857 | 2.169 |
| RAZOR2_CHECK           | 832501 | 1.511 |
| URIBL_SC_SURBL         | 238453 | 4.263 |
| URIBL_JP_SURBL         | 400931 | 2.462 |
| URIBL_OB_SURBL         | 230771 | 3.213 |
| RCVD_IN_SORBS_DUL      | 295590 | 1.987 |
| RCVD_IN_XBL            | 181377 | 3.076 |
| RCVD_IN_BL_SPAMCOP_NET | 450102 | 1.216 |
| HELO_DYNAMIC_IPADDR    | 109448 |   4.4 |
+------------------------+--------+-------+

-- 
------------------------------------------------------------------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------------------
Multi pertransibunt et augebitur scientia
              Francis Bacon Instauratio Magna (http://tinyurl.com/2j3qk5)
------------------------------------------------------------------------