You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2005/07/25 07:35:46 UTC

[Spamassassin Wiki] Update of "RulesProjStreamlining" by BobMenschel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by BobMenschel:
http://wiki.apache.org/spamassassin/RulesProjStreamlining

The comment on the change is:
Added suggestion for dealing with overlap

------------------------------------------------------------------------------
    * > 0.25% of target type hit (e.g. spam for non-nice rules)
    * < 1.00% of non-target type hit (e.g. ham for non-nice rules)
    * not too slow ;)
-   * TODO: criteria for overlap with existing rules?
+   * TODO: criteria for overlap with existing rules? BobMenschel: The method I used for weeding out SARE rules that overlapped 3.0.0 rules, was to run a full mass-check with overlap analysis, and throw away anything where the overlap is less than 50%. Manually reviewing the remaining (significantly) overlapping rules was fairly easy. The command I use is: perl ./overlap ../rules/tested/$testfile.ham.log ../rules/tested/$testfile.spam.log | grep -v mid= | awk ' NR == 1 { print } ; $2 + 0 == 1.000 && $3 + 0 >= 0.500 { print } ' >../rules/tested/$testfile.overlap.out
  
  A ruleset in the "extra" set would have different criteria.