You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex <my...@gmail.com> on 2010/04/19 08:29:48 UTC
Top Ten Rules
Hi all,
Thought I would share what my top ten rules are for the past few days,
and see if they compare with the consensus on the list:
BAYES_99 76.5%
HTML_MESSAGE 76.0%
RAZOR2_CHECK 72.8%
RAZOR2_CF_RANGE_51_100 69.7%
RAZOR2_CF_RANGE_E8_51_100 59.9%
RELAYCOUNTRY_US 59.4%
SPF_PASS 58.6%
SPF_HELO_PASS 58.6%
URIBL_BLACK 57.4%
MIME_HTML_ONLY 51.5%
SEM_URIRED 48.2%
RCVD_IN_SEMBLACK 36.3%
URIBL_JP_SURBL 35.5%
RDNS_NONE 34.6%
I know it's subjective based on site email, but if there was some
large disparity perhaps it would highlight where a problem might be
(for me or someone else...)
Thanks,
Alex
Re: Top Ten Rules
Posted by Per Jessen <pe...@computer.org>.
Alex wrote:
> Hi all,
>
> Thought I would share what my top ten rules are for the past few days,
> and see if they compare with the consensus on the list:
I think you ought to count the points scored, not just the rules hit.
/Per Jessen, Zürich
Re: Top Ten Rules
Posted by Alex <my...@gmail.com>.
Hi,
> 1 BAYES_99 9484 46.62 97.87 0.00
> 2 BOTNET 9024 44.38 93.13 0.05
> 3 RCVD_IN_BRBL_LASTEXT 8744 42.99 90.24 0.01
> 4 RCVD_IN_HOSTKARMA_BL 8581 42.18 88.56 0.01
> 5 RCVD_IN_XBL 8283 40.71 85.48 0.00
> 6 RAZOR2_CHECK 7696 37.88 79.42 0.10
Quite a difference in the top few from mine. I like the results from
the Barracuda list. I have to upgrade to v3.3 soon.
I wish I had something better than just my homegrown stuff for
generating this stuff, since I'm not using spamd. Some type of
real-time "spamtop" would be really neat.
My HOSTKARMA rules are being handled by the KHOP channel, but for some
reason I don't even see that listed on mine. This might have helped me
to locate a problem.
Thanks,
Alex
Re: Top Ten Rules
Posted by ram <ta...@gmail.com>.
On Fri, Apr 23, 2010 at 1:06 AM, Alex <my...@gmail.com> wrote:
> Hi,
>
> >> How many entries? Does it just keep growing? We have a local one too,
> >> and every so often correlate it with the public RBLs so as to not
> >> duplicate the check and overhead.
> >
> > They expire in 2 weeks. They should make it into a public RBL by
> > that time. Maybe it should even be shorter.
>
> I'm not sure that's the best approach. I can't say definitively, of
> course, but that seems very quick for them to automatically be
> expunged after two weeks.
>
> Do you have routines that query the blacklists periodically and remove
> the entries from your list based on the query result?
>
> I think that if you thought it was spam at one point, and even several
> months later it hasn't been listed on one of the public RBLs, then
> either submit it to them, or kat least keep it on your list or recheck
> it manually.
>
> Of course it depends on your workload, inherent benefit, etc...
>
> >> Sender address? Are you talking about protection from dictionary
> >> attacks, like aaa@columbia.edu, bbb@... etc?
> >
> > If the sender claims to be aaa@columbia.edu, then we can verify whether
> > the localpart aaa exists. Our own domain is the only one for which we
> > can check localpart, of course. If it does not exist, goodbye.
>
> Ah, that's a different matter. That's an easy one that we all do too.
>
> > Joseph Brennan
> > Columbia University Information Technology
>
> It would be very cool to work at Columbia :-)
>
> Regards,
> Alex
>
my stats show new server like this ( sitewide spamassassin)
is the spamassassin configured in good way. ?
or any suggestions
./sa-stats
Email: 3347 Autolearn: 1422 AvgScore: 1.44 AvgScanTime: 8.03 sec
Spam: 689 Autolearn: 287 AvgScore: 11.72 AvgScanTime: 8.16 sec
Ham: 2658 Autolearn: 1135 AvgScore: -1.23 AvgScanTime: 8.00 sec
Time Spent Running SA: 7.47 hours
Time Spent Processing Spam: 1.56 hours
Time Spent Processing Ham: 5.90 hours
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 HTML_MESSAGE 455 69.82 66.04 70.81
2 RAZOR2_CHECK 409 15.72 59.36 4.40
3 RAZOR2_CF_RANGE_51_100 389 14.40 56.46 3.50
4 BAYES_99 357 10.67 51.81 0.00
5 RAZOR2_CF_RANGE_E4_51_100 259 8.25 37.59 0.64
6 AWL 251 67.85 36.43 76.00
7 RAZOR2_CF_RANGE_E8_51_100 230 9.17 33.38 2.90
8 PYZOR_CHECK 223 7.59 32.37 1.17
9 MIME_HTML_ONLY 220 22.74 31.93 20.35
10 URIBL_BLACK 208 7.92 30.19 2.14
11 DIGEST_MULTIPLE 200 6.01 29.03 0.04
12 URIBL_JP_SURBL 172 5.32 24.96 0.23
13 BAYES_50 157 7.80 22.79 3.91
14 RDNS_NONE 148 9.59 21.48 6.51
15 SUBJ_ALL_CAPS 147 7.38 21.34 3.76
16 FORGED_MUA_OUTLOOK 129 4.51 18.72 0.83
17 MISSING_HEADERS 129 5.08 18.72 1.54
18 RCVD_IN_SORBS_WEB 126 8.37 18.29 5.79
19 URIBL_WS_SURBL 124 3.79 18.00 0.11
20 HTML_MIME_NO_HTML_TAG 121 7.83 17.56 5.30
----------------------------------------------------------------------
TOP HAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 BAYES_00 2491 75.83 6.82 93.72
2 AWL 2020 67.85 36.43 76.00
3 HTML_MESSAGE 1882 69.82 66.04 70.81
4 SPF_HELO_PASS 577 17.90 3.19 21.71
5 MIME_HTML_ONLY 541 22.74 31.93 20.35
6 DEAR_SOMETHING 276 9.08 4.06 10.38
7 RCVD_IN_DNSWL_MED 195 5.92 0.44 7.34
8 MISSING_MID 192 8.93 15.53 7.22
9 RDNS_NONE 173 9.59 21.48 6.51
10 RCVD_IN_SORBS_WEB 154 8.37 18.29 5.79
11 HTML_MIME_NO_HTML_TAG 141 7.83 17.56 5.30
12 RCVD_IN_DNSWL_LOW 119 6.30 13.35 4.48
13 RAZOR2_CHECK 117 15.72 59.36 4.40
14 MIME_QP_LONG_LINE 110 4.06 3.77 4.14
15 BAYES_50 104 7.80 22.79 3.91
16 HABEAS_ACCREDITED_SOI 100 2.99 0.00 3.76
17 SUBJ_ALL_CAPS 100 7.38 21.34 3.76
18 RAZOR2_CF_RANGE_51_100 93 14.40 56.46 3.50
19 HTML_IMAGE_RATIO_02 89 3.70 5.08 3.35
20 RCVD_IN_BSP_TRUSTED 89 2.66 0.00 3.35
----------------------------------------------------------------------
Re: Top Ten Rules
Posted by Alex <my...@gmail.com>.
Hi,
>> How many entries? Does it just keep growing? We have a local one too,
>> and every so often correlate it with the public RBLs so as to not
>> duplicate the check and overhead.
>
> They expire in 2 weeks. They should make it into a public RBL by
> that time. Maybe it should even be shorter.
I'm not sure that's the best approach. I can't say definitively, of
course, but that seems very quick for them to automatically be
expunged after two weeks.
Do you have routines that query the blacklists periodically and remove
the entries from your list based on the query result?
I think that if you thought it was spam at one point, and even several
months later it hasn't been listed on one of the public RBLs, then
either submit it to them, or kat least keep it on your list or recheck
it manually.
Of course it depends on your workload, inherent benefit, etc...
>> Sender address? Are you talking about protection from dictionary
>> attacks, like aaa@columbia.edu, bbb@... etc?
>
> If the sender claims to be aaa@columbia.edu, then we can verify whether
> the localpart aaa exists. Our own domain is the only one for which we
> can check localpart, of course. If it does not exist, goodbye.
Ah, that's a different matter. That's an easy one that we all do too.
> Joseph Brennan
> Columbia University Information Technology
It would be very cool to work at Columbia :-)
Regards,
Alex
Re: Top Ten Rules
Posted by Joseph Brennan <br...@columbia.edu>.
>> 29,148 messages : Host sending mail was in our local blocklist
>
> How many entries? Does it just keep growing? We have a local one too,
> and every so often correlate it with the public RBLs so as to not
> duplicate the check and overhead.
They expire in 2 weeks. They should make it into a public RBL by
that time. Maybe it should even be shorter.
>> 46,037 -------- : Sender address was a user unknown @columbia.edu
>> done by Sendmail, local ruleset
>
> Sender address? Are you talking about protection from dictionary
> attacks, like aaa@columbia.edu, bbb@... etc?
If the sender claims to be aaa@columbia.edu, then we can verify whether
the localpart aaa exists. Our own domain is the only one for which we
can check localpart, of course. If it does not exist, goodbye.
It's not 100% spam. A few are legit users who mistyped the address in
a mail client configuration. It's worthwhile giving them an error too,
so they'll know about it.
Joseph Brennan
Columbia University Information Technology
Re: Top Ten Rules
Posted by Alex <my...@gmail.com>.
Hi,
> 29,148 messages : Host sending mail was in our local blocklist
> note below *
How many entries? Does it just keep growing? We have a local one too,
and every so often correlate it with the public RBLs so as to not
duplicate the check and overhead.
> 42,132 -------- : Sender domain unknown
> done by Sendmail, standard ruleset
These are rejected outright, correct?
> 46,037 -------- : Sender address was a user unknown @columbia.edu
> done by Sendmail, local ruleset
Sender address? Are you talking about protection from dictionary
attacks, like aaa@columbia.edu, bbb@... etc?
> 1,222,373 -------- : RBL, host sending mail was in Spamhaus
> done by Mimedefang
Do you know what the postfix equivalent would be for mimedefang?
> I know the RBLs would be done by SpamAssassin in many installations.
> We found it more efficient to knock off the low-hanging fruit first,
> and then run SpamAssassin on what's left.
Sure, I think we all take that approach -- to reject outright email
that doesn't fit our policy, such as failed helo checks, zen checks at
smtp time, and unknown (unresolvable) domains.
Thanks,
Alex
Re: Top Ten Rules
Posted by Joseph Brennan <br...@columbia.edu>.
We run some rules in Sendmail and Mimedefang that cause rejection
before Mimedefang would run the SpamAssassin library against the
messages.
In the order the rules get hit, rejection counts from yesterday:
29,148 messages : Host sending mail was in our local blocklist
note below *
42,132 -------- : Sender domain unknown
done by Sendmail, standard ruleset
46,037 -------- : Sender address was a user unknown @columbia.edu
done by Sendmail, local ruleset
1,222,373 -------- : RBL, host sending mail was in Spamhaus
done by Mimedefang
44,091 -------- : RBL, URI in text was in Spamhaus
done by Mimedefang, local code
60,178 -------- : SpamAssassin score was 8.0 or higher
includes some local rules
I know the RBLs would be done by SpamAssassin in many installations.
We found it more efficient to knock off the low-hanging fruit first,
and then run SpamAssassin on what's left.
(* Currently done as sendmail access.db, to be converted to an RBL
to be checked by Mimedefang. Based on spam reports from our users.)
Joseph Brennan
Columbia University Information Technology
Re: Top Ten Rules
Posted by Jari Fredriksson <ja...@iki.fi>.
On 19.4.2010 9:29, Alex wrote:
> Hi all,
>
Here is mine top 20
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 BAYES_99 9484 46.62 97.87 0.00
2 BOTNET 9024 44.38 93.13 0.05
3 RCVD_IN_BRBL_LASTEXT 8744 42.99 90.24 0.01
4 RCVD_IN_HOSTKARMA_BL 8581 42.18 88.56 0.01
5 RCVD_IN_XBL 8283 40.71 85.48 0.00
6 RAZOR2_CHECK 7696 37.88 79.42 0.10
7 RAZOR2_CF_RANGE_51_100 7627 37.52 78.71 0.06
8 HTML_MESSAGE 7590 50.63 78.33 25.45
9 URIBL_BLACK 7141 35.23 73.69 0.25
10 RAZOR2_CF_RANGE_E8_51_100 7075 34.81 73.01 0.06
11 RCVD_IN_NIX_SPAM 7005 34.52 72.29 0.16
12 RCVD_IN_BL_SPAMCOP_NET 6436 31.77 66.42 0.26
13 RCVD_IN_PSBL 6306 31.00 65.08 0.00
14 URIBL_DBL_SPAM 6152 30.24 63.49 0.00
15 MIME_HTML_ONLY 6025 30.60 62.18 1.88
16 T_URIBL_BLACK_OVERLAP 5477 26.99 56.52 0.13
17 RCVD_IN_SORBS_WEB 5298 26.60 54.67 1.06
18 URIBL_JP_SURBL 5132 25.29 52.96 0.13
19 KHOP_DNSBL_ADJ 5058 24.86 52.20 0.00
20 RDNS_NONE 4877 24.00 50.33 0.06
--
http://www.iki.fi/jarif/
The man who sets out to carry a cat by its tail learns something that
will always be useful and which never will grow dim or doubtful.
-- Mark Twain
Re: Top Ten Rules
Posted by Alex <my...@gmail.com>.
Hi,
>> Thought I would share what my top ten rules are for the past few days,
>> and see if they compare with the consensus on the list:
>
> I do not have such a thing. I have a cumulative count of the hits for about
> the last 4 years. These are the top ten per pure occurrence
>
> +------------------------+---------+
> | rule | noccur |
> +------------------------+---------+
> | BAYES_99 | 955517 |
> | HTML_MESSAGE | 892230 |
The format looks like it came from a database table dump? That would
be a good idea. I'd like to experiment with the SA Stats plugin for
mysql and see what type of manipulation can be done with the data.
Anyone have any experience with the Stats plugin?
Thanks,
Alex
Re: Top Ten Rules
Posted by Lucio Chiappetti <lu...@lambrate.inaf.it>.
On Mon, 19 Apr 2010, Alex wrote:
> Thought I would share what my top ten rules are for the past few days,
> and see if they compare with the consensus on the list:
I do not have such a thing. I have a cumulative count of the hits for
about the last 4 years. These are the top ten per pure occurrence
+------------------------+---------+
| rule | noccur |
+------------------------+---------+
| BAYES_99 | 955517 |
| HTML_MESSAGE | 892230 |
| RAZOR2_CHECK | 832501 |
| RAZOR2_CF_RANGE_51_100 | 808921 |
| DCC_CHECK | 739857 |
| DIGEST_MULTIPLE | 597328 |
| RCVD_IN_BL_SPAMCOP_NET | 450102 |
| URIBL_JP_SURBL | 400931 |
| MIME_HTML_ONLY | 349913 |
| RCVD_IN_SORBS_DUL | 295590 |
+------------------------+---------+
These are the top ten per number of occurrence multiplied by the score
+------------------------+--------+-------+
| rule | noccur | score |
+------------------------+--------+-------+
| BAYES_99 | 955517 | 3.5 |
| DCC_CHECK | 739857 | 2.169 |
| RAZOR2_CHECK | 832501 | 1.511 |
| URIBL_SC_SURBL | 238453 | 4.263 |
| URIBL_JP_SURBL | 400931 | 2.462 |
| URIBL_OB_SURBL | 230771 | 3.213 |
| RCVD_IN_SORBS_DUL | 295590 | 1.987 |
| RCVD_IN_XBL | 181377 | 3.076 |
| RCVD_IN_BL_SPAMCOP_NET | 450102 | 1.216 |
| HELO_DYNAMIC_IPADDR | 109448 | 4.4 |
+------------------------+--------+-------+
--
------------------------------------------------------------------------
Lucio Chiappetti - INAF/IASF - via Bassini 15 - I-20133 Milano (Italy)
For more info : http://www.iasf-milano.inaf.it/~lucio/personal.html
------------------------------------------------------------------------
Multi pertransibunt et augebitur scientia
Francis Bacon Instauratio Magna (http://tinyurl.com/2j3qk5)
------------------------------------------------------------------------