You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/08/21 09:18:55 UTC
Overlap
While looking through SARE rules for overlaps with 3.1.0, I found:
> COUNT PAIR/A PAIR/B A,B
> 51031 1.000 1.000 __OUTLOOK_DOLLARS_MSGID,__OE_MSGID_2
20_ratware.cf:header __OUTLOOK_DOLLARS_MSGID MESSAGEID =~ /^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
20_ratware.cf:header __OE_MSGID_2 MESSAGEID =~ /^<(?:[0-9a-f]{8}|[0-9a-f]{12})\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
If these two patterns actually do match exactly the same emails, in
other corpora and not just mine, perhaps they should be combined into
one __rule for slightly improved efficiency?
Bob Menschel
Re: Overlap
Posted by Theo Van Dinter <fe...@apache.org>.
On Sun, Aug 21, 2005 at 12:18:55AM -0700, Robert Menschel wrote:
> 20_ratware.cf:header __OUTLOOK_DOLLARS_MSGID MESSAGEID =~ /^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
> 20_ratware.cf:header __OE_MSGID_2 MESSAGEID =~ /^<(?:[0-9a-f]{8}|[0-9a-f]{12})\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
>
> If these two patterns actually do match exactly the same emails, in
> other corpora and not just mine, perhaps they should be combined into
> one __rule for slightly improved efficiency?
Hrm. If I read the RE correctly, OE_MSGID_2 will potentially match against
more messages, and will always match against the same messages as
OUTLOOK_DOLLARS_MSGID.
OE_MSGID_2 is used in: __FORGED_OE
OUTLOOK_DOLLARS_MSGID is used in: FORGED_MUA_OIMO __FORGED_OUTLOOK_DOLLARS MSGID_DOLLARS
Both are eventually used in: FORGED_MUA_OUTLOOK
So yeah, I think we could try replacing __OUTLOOK_DOLLARS_MSGID with
__OE_MSGID_2, unless someone has a specific reason to have one look for just
12 char prefixes (perhaps a specific version of OE?)
--
Randomly Generated Tagline:
"But these go to 11..." - Spinal Tap