You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/11/24 15:42:38 UTC

[Bug 5206] New: RFE: detect and merge duplicate rules for efficiency

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206

           Summary: RFE: detect and merge duplicate rules for efficiency
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


I just had a case where a meta rule in rulesrc/sandbox/jm depended on a meta
subrule which (it turned out) was in rulesrc/sandbox/dos.  (I thought it
was a released rule, but it wasn't ;)

Anyway, for cases like this, it would make sense to copy those "external"
dependencies alongside our meta rule, so they won't get "lost" or accidentally
deleted.  This is possible now, but it results in duplication -- both
the original dependency *and* our copy run, separately, even though they're
testing the same thing.

I suggest that for simple regexp rules (like body, header, rawbody, full),
we detect cases during parsing where the rule source is the same:

    header RULE_FOO     Foo =~ /bar/
    header RULE_BAZ     Foo =~ /bar/

and internally collapse those into one.  (an efficient way would be to mark
RULE_BAZ in a duplicates hashtable, e.g. "$conf->{duplicates}->{RULE_FOO} = [
RULE_BAZ ];" -- then when got_hit("RULE_FOO") is called, that automatically
fires "RULE_BAZ" too.)  This can be done during finish_parsing(), I'd say.

Since that happens at parse time, the only effects are internal; and if someone
later comes along and deletes the source line for RULE_FOO, RULE_BAZ simply
becomes its own rule with no dups and there's no visible change.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5206] RFE: detect and merge duplicate rules for efficiency

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.2.0




------- Additional Comments From jm@jmason.org  2006-12-06 08:51 -------
I'm going to try and do this for 3.2.0



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5206] RFE: detect and merge duplicate rules for efficiency

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2006-12-07 12:38 -------
yay.  done:

: jm 215...; svn commit -m "bug 5206: detect duplicate rules, and silently merge
them internally for greater efficiency.  This results in about 100-120KB RAM
usage saving in current svn trunk's ruleset, detecting lots of duplicate rules
-- so is well worth doing.  also, change t/priorities.t so it doesn't
accidentally confuse itself with duplicate rules"
Sending        MANIFEST
Sending        lib/Mail/SpamAssassin/Conf/Parser.pm
Sending        lib/Mail/SpamAssassin/Conf.pm
Sending        lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending        masses/mass-check
Adding         masses/plugins/Dumpmem.pm
Adding         t/duplicates.t
Sending        t/priorities.t
Transmitting file data ........
Committed revision 483650.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5206] RFE: detect and merge duplicate rules for efficiency

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5206





------- Additional Comments From lwilton@earthlink.net  2006-11-24 16:37 -------
This might have wider applicability that just the case you mention.  SARE has 
more than once made a rule as a meta component only to discover that it exists 
in the main rules or in some other rules file.  It might be nice to have a --
warn-rules option that would turn on warning messages for things like detected 
duplicate rules (and missing meta dependencies, and all the other nitty rule 
warnings that people are now complaining about).




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.