You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/10/10 20:25:17 UTC

[Bug 5123] New: simplify rulesrc implementation

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123

           Summary: simplify rulesrc implementation
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Rules
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


As discussed on the dev list last month, Theo believes that we need
to trim back some of the more baroque aspects of the rules project work.
I'm happy enough with that.  Here's what's suggested (pasted from
earlier mails on dev):

  Me: 'Hmm -- I think I'd need more details of how that'd work [..]'

  Theo: 'Keep 3.1 the way it currently is, semi-revert 3.2 to have a rules
  directory and we'll put a minimal set of 3.2 rules in there.
  External to the normal distro, we have code that generates updates based on
  the mass-check results (or whatever else we want to base them on).

  [on what files are distributed:] I think our current methodology of:

  - user downloads the engine, installs it, it comes with a core set of rules
  - user is encouraged to run sa-update, and then run it periodically as well
    to get the complete set of rules

  works well.  It doesn't require us to have everything in the engine
  distribution though.'



So in other words, the distributed tarball consists of:

  - engine
  - core rules in "rules/" directory

the SVN build would consist of:

  - engine
  - core rules in "rules/" directory
  - an SVN external for the "rulesrc/" directory
  - and the build/mkrules script to compile the latter

the "sa-update" distributed rules tarballs would have:

  - core rules in "rules/" directory
  - plus the 'active' rules in "rules/" directory (in other words, the compiled
    output from build/mkrules)


Another issue is version-dependent rulesets; these would move back
out of rulesrc, and back into the "rules/" directory, alongside
the engine whose code they depend on.

In terms of  implementation, it's pretty simple:

1. Move the core rules files from "rulesrc/core" back into "rules".

1a. (Maybe trim out the rules that are currently showing up as
unusable in the rule-QA app, since there's doubtless some rule rot
by now...)

2. change Makefile.PL to not run "build/mkrules" unless the "rulesrc"
directory is present; that will only be the case with an SVN
checkout rather than a "make dist" dir or tarball.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|simplify rulesrc            |should 'rules/72_active.cf'
                   |implementation              |be distributed in the distro
                   |                            |tarball?






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] simplify rulesrc implementation

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |4681
              nThis|                            |






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2006-12-28 09:46 -------
ok, my last suggestion garnered no complaints, so let's go that way ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] simplify rulesrc implementation

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From spamassassin@dostech.ca  2006-10-13 11:23 -------
Sounds like a reasonable plan of action to me.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From jm@jmason.org  2006-11-14 10:59 -------
fwiw, I did the mucky task:

: jm 1009...; svn commit -m "bug 5123: remove some more vestiges of now-obsolete
eval rules, namely the following HTML flags and range variables: attr_seen_*
attr_unique_bad attr_bad html_event_unsafe big_font font_face_caps
font_invisible tiny_font max_shouting text_after_body text_after_html"
lib/Mail/SpamAssassin/HTML.pm
Sending        lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .
Committed revision 474920.

it had a nice side-effect of reducing scanner memory usage by 44KB.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] simplify rulesrc implementation

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |blocker
   Target Milestone|Undefined                   |3.2.0




------- Additional Comments From jm@jmason.org  2006-10-11 02:53 -------
aiming at 3.2.0 as a blocker; this is something we need to decide before that
release



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] simplify rulesrc implementation

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From jm@jmason.org  2006-10-12 10:04 -------
is the silence indicative of general happiness?  Warnock's dilemma strikes ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From sa-list@alexb.ch  2006-11-28 11:12 -------
There's abunch of rules which are nciely used in Metas. I don't think its a good
idea to remove them totally. Maybe keep in separate file and score them 0.01

Stuff like HTML_IMAGE_RATIO_06 come in very handy agains stock spams which
combined with *for example*  __ANY_OUTLOOK_MUA and some Msg-Id rule.

here's some examples I have in production:

META_OENOBDARY      (__ANY_OUTLOOK_MUA && MIME_MISSING_BOUNDARY) �#would stop
working if you remove MIME_MISSING_BOUNDARY

META_ULESSBIZ        (BIZ_TLD && !__ANY_OUTLOOK_MUA && __HAS_URI &&
__HAS_X_MAILER!) # would be rendered useless without BIZ_TLD

I doubt I'm the only one who uses stuff like this to add the extra point to a
final score.

If there's any chance of keepigng these rules, I'd work my way thru the list and
a large collection of Metas and check if & what is worth keeping.

Alex




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From jm@jmason.org  2006-12-13 04:43 -------
I may see if we can resurrect those HTML_IMAGE_RATIO_* rules as meta subrules,
for Alex. Let's keep that a separate issue, in bug 5242.

in the meantime, back to the other issue... I'm going to vote that we should
distribute rules/72_active.cf in the main distro tarball.  Basically, otherwise,
when we run the Perceptron at release time, it will only be able to make scores
for the rules in the main tarball, not taken from rulesrc via 72_active.cf.

This would mean that some of the best rules in the ruleset would go unscored.
Until we have a way to Perceptron-optimise rules for published rule updates, we
can't do this.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5123] simplify rulesrc implementation

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123





------- Additional Comments From jm@jmason.org  2006-10-20 08:22 -------
(In reply to comment #3)
> Sounds like a reasonable plan of action to me.

OK.  the low-hanging fruit is now complete -- that's part (1) and (3).  next
thing to do is (2) spend some time going through
http://ruleqa.spamassassin.org/20061018-r465178-n , and removing the rules that
are greyed-out, in the core ruleset, and not required for other purposes.

Actually, there may also be a step 4.  currently, rules/72_active.cf -- ie. the
compiled "good enough" rules from the rulesrc/sandbox/* tree -- is being
distributed in distro tarballs; however, I think Theo was considering that it
shouldn't be, and instead should only be shipped in sa-update tarballs.

Theo, are you really keen on that idea?  I don't think it's too important, since
it will probably work fine the way it is now anyway...




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.