You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/25 15:34:39 UTC

Re[2]: Hackathon summary

Hello Daniel,

Sunday, July 24, 2005, 11:34:33 PM, you wrote:

DQ> Robert Menschel <Ro...@Menschel.net> writes:

>>> TODO: criteria for overlap with existing rules?
>>> BobMenschel: The method I used for weeding out SARE rules that
>>> overlapped 3.0.0 rules, was to run a full mass-check with overlap
>>> analysis, and throw away anything where the overlap is less than
>>> 50%.

DQ> By "throw away", do you mean put into the bucket that is retained going
DQ> forward or did you mean to say "greater than 50%"?

By "throw away anything where the overlap is less than 50%" I meant
to discard (exclude from the final file) anything where the overlap
was (IMO) insignificant.

This would leave those overlaps where RULE_A hit all the emails that
RULE_B also hit (100%), and RULE_B hit somewhere between 50% and 100%
of the rules that RULE_A hit.

It'd also be good to identify overlaps where RULE_A hit 90% of what
RULE_B hit, and RULE_B hit 90% of what rule A hit, but neither hit
100% of the other's ...

Bob Menschel