You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/11/05 00:52:27 UTC
[Bug 6155] generate new scores for 3.3.0 release
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6155
Adam Katz <an...@khopis.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #4561|0 |1
is obsolete| |
--- Comment #145 from Adam Katz <an...@khopis.com> 2009-11-04 15:52:15 UTC ---
Created an attachment (id=4564)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4564)
Checker for rules that match more ham than spam
Updated my checker to use S/O (now that I understand that stat). It also
supports specifying the DateRev for the specific masscheck run. Since today's
run was sparse, here are yesterday's results.
$ ./sa33badrules.pl 20091103-r832343-n
S/O RANK HAM% SPAM% Score in attachment 4558 Rule
.008 .12 1.2401 0.0105 0.001 MSGID_MULTIPLE_AT
.011 .22 0.3066 0.0035 0 OBSCURED_EMAIL
.012 .25 0.2058 0.0025 0.000 2.099 0.001 1.212 MISSING_MIME_HB_SEP
.014 .17 0.5822 0.0080 0.001 0.001 0.699 0.699 TVD_RCVD_SPACE_BRACKET
.028 .20 0.4339 0.0125 unknown TVD_FUZZY_SECTOR
.042 .28 0.1732 0.0075 0 SUBJECT_FUZZY_TION
.048 .77 4.4862 0.2279 -0.001 SPF_HELO_PASS
.052 .29 0.1476 0.0080 1.494 1.699 1.591 1.516 X_IP
.055 .22 0.3914 0.0226 2.205 0.174 1.299 1.806 FRT_SOMA2
.062 .74 5.1484 0.3424 -0.001 SPF_PASS
.077 .25 0.2643 0.0221 0.987 0.750 0.943 1.318 CTYPE_001C_B
.079 .36 0.0640 0.0055 0.001 0.001 0.605 0.378 HTML_NONELEMENT_30_40
.080 .28 0.1742 0.0151 0.001 2.499 0.268 0.516 DRUGS_MUSCLE
.084 .36 0.0660 0.0060 0 FORGED_IMS_TAGS
.090 .32 0.1114 0.0110 0.033 0.001 0.365 0.413 WEIRD_PORT
.092 .21 0.8712 0.0878 1.499 0.419 0.904 0.798 MIME_BASE64_BLANKS
.102 .37 0.0577 0.0065 0 HTML_IFRAME_SRC
.123 .34 0.0821 0.0115 0.003 0.978 0.100 1.515 TVD_FW_GRAPHIC_NAME_LONG
.128 .37 0.0614 0.0090 0 RCVD_BAD_ID
.130 .29 0.1851 0.0276 0.001 0.020 0.001 1.799 MIME_BASE64_TEXT
.178 .28 0.4948 0.1069 0 1.200 0 2.514 SPF_HELO_FAIL
.202 .32 0.1590 0.0402 0.1 ANY_BOUNCE_MESSAGE
.205 .35 0.0817 0.0211 2.199 1.622 2.199 1.086 LONGWORDS
.213 .34 0.1186 0.0321 0 BLANK_LINES_80_90
.216 .32 0.1474 0.0407 2.199 2.199 1.246 2.090 WEIRD_QUOTING
.218 .32 0.1445 0.0402 0.1 BOUNCE_MESSAGE
.223 .30 0.7605 0.2179 1.799 0.572 1.182 1.138 HTML_IMAGE_RATIO_06
.241 .34 1.3973 0.4438 1.0 EXTRA_MPART_TYPE
.254 .34 0.1222 0.0417 0.001 2.185 1.936 0.476 FRT_SOMA
.283 .33 0.6883 0.2711 0.539 0.001 0.332 0.488 MIME_HTML_MOSTLY
.299 .36 0.0908 0.0387 0.799 0.001 0.711 0.026 TVD_FW_GRAPHIC_NAME_MID
.303 .34 0.4938 0.2143 1.899 0.496 0.950 0.445 HTML_IMAGE_RATIO_08
.367 .40 1.2775 0.7409 0.001 TVD_SPACE_RATIO
.379 .37 0.3182 0.1943 0.023 0.887 0.000 0.417 UPPERCASE_50_75
.434 .39 0.3261 0.2505 3.099 1.823 1.802 1.998 BAD_ENC_HEADER
.436 .46 15.3798 11.8920 0.001 FREEMAIL_FROM
.454 .41 0.5503 0.4573 2.260 0.742 1.199 0.640 MPART_ALT_DIFF
.516 .47 3.6581 3.9024 0.001 MIME_QP_LONG_LINE
.655 .51 1.9537 3.7036 1.154 1.677 1.198 1.453 SUBJ_ALL_CAPS
.665 .49 42.2269 83.7383 0.001 HTML_MESSAGE
.692 .52 1.1850 2.6580 0.001 UNPARSEABLE_RELAY
.922 .58 1.1584 13.7423 0 1.322 0 1.237 RCVD_IN_BL_SPAMCOP_NET
.935 .57 3.5421 50.6034 2.199 0.955 1.215 0.549 MIME_HTML_ONLY
.970 .52 1.5729 51.1430 0 1.1 0 0.7 RDNS_NONE
Note, I hacked RDNS_NONE so that it removes the Enron hits.
"Problem" rules this week include X_IP, EXTRA_MPART_TYPE, FRT_SOMA2, and
BAD_ENC_HEADER (scored 3.099?!).
Food for thought: while it's good to create workarounds for the problematic
outcomes from the genetic algorithm, I think that these should be examples with
which to troubleshoot the algorithm itself while this might just be an early
sign of over-fitting (which is largely fine as long as we comb through the
results with scripts like this), it might also be indicative of a problem in
the system's prioritization.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.