You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/25 15:34:39 UTC
Re[2]: Hackathon summary
Hello Daniel,
Sunday, July 24, 2005, 11:34:33 PM, you wrote:
DQ> Robert Menschel <Ro...@Menschel.net> writes:
>>> TODO: criteria for overlap with existing rules?
>>> BobMenschel: The method I used for weeding out SARE rules that
>>> overlapped 3.0.0 rules, was to run a full mass-check with overlap
>>> analysis, and throw away anything where the overlap is less than
>>> 50%.
DQ> By "throw away", do you mean put into the bucket that is retained going
DQ> forward or did you mean to say "greater than 50%"?
By "throw away anything where the overlap is less than 50%" I meant
to discard (exclude from the final file) anything where the overlap
was (IMO) insignificant.
This would leave those overlaps where RULE_A hit all the emails that
RULE_B also hit (100%), and RULE_B hit somewhere between 50% and 100%
of the rules that RULE_A hit.
It'd also be good to identify overlaps where RULE_A hit 90% of what
RULE_B hit, and RULE_B hit 90% of what rule A hit, but neither hit
100% of the other's ...
Bob Menschel