You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/14 08:17:46 UTC

[Bug 3269] New: rules that broke since 2.6x

http://bugzilla.spamassassin.org/show_bug.cgi?id=3269

           Summary: rules that broke since 2.6x
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: spamassassin-dev@incubator.apache.org
        ReportedBy: quinlan@pathname.com


I did a mass-check with both 2.6x and SVN HEAD and compared rules that
appear in both trees and hit more than 1% of spam with 0.9 S/O or better
in 2.6x.  Here are the 10 rules the lost the most spam hits out of 14999
spam messages:

  -63     TRACKER_ID
  -140    FROM_ALL_NUMS
  -174    MSGID_DOLLARS
  -206    MIME_BASE64_TEXT
  -228    MIME_BASE64_NO_NAME
  -361    HTML_60_70
  -389    HTML_TAG_BALANCE_HTML
  -478    HTML_TAG_BALANCE_BODY
  -617    HTML_50_60
  -760    MIME_QP_LONG_LINE

I'm ignoring the HTML percentage rules since that's just the ranges
changing.  Other ranges increased.

For the remaining rules, here are the before/after hit-frequencies:

  1.081   2.1601   0.0000    1.000   0.93    2.17  FROM_ALL_NUMS:2.6
  0.614   1.2267   0.0000    1.000   0.93    2.17  FROM_ALL_NUMS:3.0

Definitely not an improvement for me, should retest old rule.

  7.108  14.0076   0.2003    0.986   0.91    0.35  HTML_TAG_BALANCE_BODY:2.6
  5.427  10.8207   0.0267    0.998   0.94    0.35  HTML_TAG_BALANCE_BODY:3.0

Looks like an overall improvement.

  3.112   5.9604   0.2603    0.958   0.83    0.67  HTML_TAG_BALANCE_HTML:2.6
  1.691   3.3669   0.0134    0.996   0.92    0.67  HTML_TAG_BALANCE_HTML:3.0

Looks like an overall improvement.

  2.959   5.7470   0.1669    0.972   0.86    0.19  MIME_BASE64_NO_NAME:2.6
  2.165   4.2269   0.1001    0.977   0.87    0.19  MIME_BASE64_NO_NAME:3.0

Hmmm... tough call.

  2.832   5.6004   0.0601    0.989   0.91    1.10  MIME_BASE64_TEXT:2.6
  2.135   4.2269   0.0401    0.991   0.91    1.10  MIME_BASE64_TEXT:3.0

Hmmm... tough call.

  3.286   6.0871   0.4806    0.927   0.75    0.24  MIME_QP_LONG_LINE:2.6
  0.570   1.0201   0.1202    0.895   0.67    0.24  MIME_QP_LONG_LINE:3.0

Wow, something bad happened here.  Are we decoding these away?

  3.756   7.3005   0.2069    0.972   0.86    1.00  MSGID_DOLLARS:2.6
  3.072   6.1404   0.0000    1.000   0.94    1.00  MSGID_DOLLARS:3.0

Looks like a very nice improvement!

  1.024   2.0401   0.0067    0.997   0.92    2.53  TRACKER_ID:2.6
  0.814   1.6201   0.0067    0.996   0.92    2.53  TRACKER_ID:3.0

Looks like this was also a loss, but I recall some bug was fixed.  Maybe
retest anyway?

So, these are the ones that we might want to revisit:

  1.081   2.1601   0.0000    1.000   0.93    2.17  FROM_ALL_NUMS:2.6
  0.614   1.2267   0.0000    1.000   0.93    2.17  FROM_ALL_NUMS:3.0
  3.286   6.0871   0.4806    0.927   0.75    0.24  MIME_QP_LONG_LINE:2.6
  0.570   1.0201   0.1202    0.895   0.67    0.24  MIME_QP_LONG_LINE:3.0
  1.024   2.0401   0.0067    0.997   0.92    2.53  TRACKER_ID:2.6
  0.814   1.6201   0.0067    0.996   0.92    2.53  TRACKER_ID:3.0

Incidentally, these prerequisite rules lost a heck of a lot of hits:

  -712    __MIME_BASE64
  -4570   __MIME_QP

That seems related to the huge drop in MIME_QP_LONG_LINE and the small
drops in the MIME_BASE64 rules.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.