You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/08/21 09:18:55 UTC

Overlap

While looking through SARE rules for overlaps with 3.1.0, I found:

> COUNT   PAIR/A  PAIR/B  A,B
> 51031   1.000   1.000   __OUTLOOK_DOLLARS_MSGID,__OE_MSGID_2

20_ratware.cf:header __OUTLOOK_DOLLARS_MSGID  MESSAGEID =~ /^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
20_ratware.cf:header __OE_MSGID_2             MESSAGEID =~ /^<(?:[0-9a-f]{8}|[0-9a-f]{12})\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m

If these two patterns actually do match exactly the same emails, in
other corpora and not just mine, perhaps they should be combined into
one __rule for slightly improved efficiency?

Bob Menschel




Re: Overlap

Posted by Theo Van Dinter <fe...@apache.org>.
On Sun, Aug 21, 2005 at 12:18:55AM -0700, Robert Menschel wrote:
> 20_ratware.cf:header __OUTLOOK_DOLLARS_MSGID  MESSAGEID =~ /^<[0-9a-f]{12}\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
> 20_ratware.cf:header __OE_MSGID_2             MESSAGEID =~ /^<(?:[0-9a-f]{8}|[0-9a-f]{12})\$[0-9a-f]{8}\$[0-9a-f]{8}\@\S+>$/m
> 
> If these two patterns actually do match exactly the same emails, in
> other corpora and not just mine, perhaps they should be combined into
> one __rule for slightly improved efficiency?

Hrm.  If I read the RE correctly, OE_MSGID_2 will potentially match against
more messages, and will always match against the same messages as
OUTLOOK_DOLLARS_MSGID.

OE_MSGID_2 is used in: __FORGED_OE
OUTLOOK_DOLLARS_MSGID is used in: FORGED_MUA_OIMO __FORGED_OUTLOOK_DOLLARS MSGID_DOLLARS
Both are eventually used in: FORGED_MUA_OUTLOOK

So yeah, I think we could try replacing __OUTLOOK_DOLLARS_MSGID with
__OE_MSGID_2, unless someone has a specific reason to have one look for just
12 char prefixes (perhaps a specific version of OE?)

-- 
Randomly Generated Tagline:
"But these go to 11..."               - Spinal Tap