You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/14 08:17:46 UTC
[Bug 3269] New: rules that broke since 2.6x
http://bugzilla.spamassassin.org/show_bug.cgi?id=3269
Summary: rules that broke since 2.6x
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Libraries
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: quinlan@pathname.com
I did a mass-check with both 2.6x and SVN HEAD and compared rules that
appear in both trees and hit more than 1% of spam with 0.9 S/O or better
in 2.6x. Here are the 10 rules the lost the most spam hits out of 14999
spam messages:
-63 TRACKER_ID
-140 FROM_ALL_NUMS
-174 MSGID_DOLLARS
-206 MIME_BASE64_TEXT
-228 MIME_BASE64_NO_NAME
-361 HTML_60_70
-389 HTML_TAG_BALANCE_HTML
-478 HTML_TAG_BALANCE_BODY
-617 HTML_50_60
-760 MIME_QP_LONG_LINE
I'm ignoring the HTML percentage rules since that's just the ranges
changing. Other ranges increased.
For the remaining rules, here are the before/after hit-frequencies:
1.081 2.1601 0.0000 1.000 0.93 2.17 FROM_ALL_NUMS:2.6
0.614 1.2267 0.0000 1.000 0.93 2.17 FROM_ALL_NUMS:3.0
Definitely not an improvement for me, should retest old rule.
7.108 14.0076 0.2003 0.986 0.91 0.35 HTML_TAG_BALANCE_BODY:2.6
5.427 10.8207 0.0267 0.998 0.94 0.35 HTML_TAG_BALANCE_BODY:3.0
Looks like an overall improvement.
3.112 5.9604 0.2603 0.958 0.83 0.67 HTML_TAG_BALANCE_HTML:2.6
1.691 3.3669 0.0134 0.996 0.92 0.67 HTML_TAG_BALANCE_HTML:3.0
Looks like an overall improvement.
2.959 5.7470 0.1669 0.972 0.86 0.19 MIME_BASE64_NO_NAME:2.6
2.165 4.2269 0.1001 0.977 0.87 0.19 MIME_BASE64_NO_NAME:3.0
Hmmm... tough call.
2.832 5.6004 0.0601 0.989 0.91 1.10 MIME_BASE64_TEXT:2.6
2.135 4.2269 0.0401 0.991 0.91 1.10 MIME_BASE64_TEXT:3.0
Hmmm... tough call.
3.286 6.0871 0.4806 0.927 0.75 0.24 MIME_QP_LONG_LINE:2.6
0.570 1.0201 0.1202 0.895 0.67 0.24 MIME_QP_LONG_LINE:3.0
Wow, something bad happened here. Are we decoding these away?
3.756 7.3005 0.2069 0.972 0.86 1.00 MSGID_DOLLARS:2.6
3.072 6.1404 0.0000 1.000 0.94 1.00 MSGID_DOLLARS:3.0
Looks like a very nice improvement!
1.024 2.0401 0.0067 0.997 0.92 2.53 TRACKER_ID:2.6
0.814 1.6201 0.0067 0.996 0.92 2.53 TRACKER_ID:3.0
Looks like this was also a loss, but I recall some bug was fixed. Maybe
retest anyway?
So, these are the ones that we might want to revisit:
1.081 2.1601 0.0000 1.000 0.93 2.17 FROM_ALL_NUMS:2.6
0.614 1.2267 0.0000 1.000 0.93 2.17 FROM_ALL_NUMS:3.0
3.286 6.0871 0.4806 0.927 0.75 0.24 MIME_QP_LONG_LINE:2.6
0.570 1.0201 0.1202 0.895 0.67 0.24 MIME_QP_LONG_LINE:3.0
1.024 2.0401 0.0067 0.997 0.92 2.53 TRACKER_ID:2.6
0.814 1.6201 0.0067 0.996 0.92 2.53 TRACKER_ID:3.0
Incidentally, these prerequisite rules lost a heck of a lot of hits:
-712 __MIME_BASE64
-4570 __MIME_QP
That seems related to the huge drop in MIME_QP_LONG_LINE and the small
drops in the MIME_BASE64 rules.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.