You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/11/24 15:42:38 UTC
[Bug 5206] New: RFE: detect and merge duplicate rules for efficiency
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206
Summary: RFE: detect and merge duplicate rules for efficiency
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P5
Component: Libraries
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
I just had a case where a meta rule in rulesrc/sandbox/jm depended on a meta
subrule which (it turned out) was in rulesrc/sandbox/dos. (I thought it
was a released rule, but it wasn't ;)
Anyway, for cases like this, it would make sense to copy those "external"
dependencies alongside our meta rule, so they won't get "lost" or accidentally
deleted. This is possible now, but it results in duplication -- both
the original dependency *and* our copy run, separately, even though they're
testing the same thing.
I suggest that for simple regexp rules (like body, header, rawbody, full),
we detect cases during parsing where the rule source is the same:
header RULE_FOO Foo =~ /bar/
header RULE_BAZ Foo =~ /bar/
and internally collapse those into one. (an efficient way would be to mark
RULE_BAZ in a duplicates hashtable, e.g. "$conf->{duplicates}->{RULE_FOO} = [
RULE_BAZ ];" -- then when got_hit("RULE_FOO") is called, that automatically
fires "RULE_BAZ" too.) This can be done during finish_parsing(), I'd say.
Since that happens at parse time, the only effects are internal; and if someone
later comes along and deletes the source line for RULE_FOO, RULE_BAZ simply
becomes its own rule with no dups and there's no visible change.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5206] RFE: detect and merge duplicate rules for efficiency
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.2.0
------- Additional Comments From jm@jmason.org 2006-12-06 08:51 -------
I'm going to try and do this for 3.2.0
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5206] RFE: detect and merge duplicate rules for efficiency
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2006-12-07 12:38 -------
yay. done:
: jm 215...; svn commit -m "bug 5206: detect duplicate rules, and silently merge
them internally for greater efficiency. This results in about 100-120KB RAM
usage saving in current svn trunk's ruleset, detecting lots of duplicate rules
-- so is well worth doing. also, change t/priorities.t so it doesn't
accidentally confuse itself with duplicate rules"
Sending MANIFEST
Sending lib/Mail/SpamAssassin/Conf/Parser.pm
Sending lib/Mail/SpamAssassin/Conf.pm
Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending masses/mass-check
Adding masses/plugins/Dumpmem.pm
Adding t/duplicates.t
Sending t/priorities.t
Transmitting file data ........
Committed revision 483650.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5206] RFE: detect and merge duplicate rules for efficiency
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206
------- Additional Comments From lwilton@earthlink.net 2006-11-24 16:37 -------
This might have wider applicability that just the case you mention. SARE has
more than once made a rule as a meta component only to discover that it exists
in the main rules or in some other rules file. It might be nice to have a --
warn-rules option that would turn on warning messages for things like detected
duplicate rules (and missing meta dependencies, and all the other nitty rule
warnings that people are now complaining about).
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.