You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2006/10/10 20:25:17 UTC
[Bug 5123] New: simplify rulesrc implementation
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
Summary: simplify rulesrc implementation
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: major
Priority: P5
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
As discussed on the dev list last month, Theo believes that we need
to trim back some of the more baroque aspects of the rules project work.
I'm happy enough with that. Here's what's suggested (pasted from
earlier mails on dev):
Me: 'Hmm -- I think I'd need more details of how that'd work [..]'
Theo: 'Keep 3.1 the way it currently is, semi-revert 3.2 to have a rules
directory and we'll put a minimal set of 3.2 rules in there.
External to the normal distro, we have code that generates updates based on
the mass-check results (or whatever else we want to base them on).
[on what files are distributed:] I think our current methodology of:
- user downloads the engine, installs it, it comes with a core set of rules
- user is encouraged to run sa-update, and then run it periodically as well
to get the complete set of rules
works well. It doesn't require us to have everything in the engine
distribution though.'
So in other words, the distributed tarball consists of:
- engine
- core rules in "rules/" directory
the SVN build would consist of:
- engine
- core rules in "rules/" directory
- an SVN external for the "rulesrc/" directory
- and the build/mkrules script to compile the latter
the "sa-update" distributed rules tarballs would have:
- core rules in "rules/" directory
- plus the 'active' rules in "rules/" directory (in other words, the compiled
output from build/mkrules)
Another issue is version-dependent rulesets; these would move back
out of rulesrc, and back into the "rules/" directory, alongside
the engine whose code they depend on.
In terms of implementation, it's pretty simple:
1. Move the core rules files from "rulesrc/core" back into "rules".
1a. (Maybe trim out the rules that are currently showing up as
unusable in the rule-QA app, since there's doubtless some rule rot
by now...)
2. change Makefile.PL to not run "build/mkrules" unless the "rulesrc"
directory is present; that will only be the case with an SVN
checkout rather than a "make dist" dir or tarball.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|simplify rulesrc |should 'rules/72_active.cf'
|implementation |be distributed in the distro
| |tarball?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] simplify rulesrc implementation
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |4681
nThis| |
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2006-12-28 09:46 -------
ok, my last suggestion garnered no complaints, so let's go that way ;)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] simplify rulesrc implementation
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From spamassassin@dostech.ca 2006-10-13 11:23 -------
Sounds like a reasonable plan of action to me.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From jm@jmason.org 2006-11-14 10:59 -------
fwiw, I did the mucky task:
: jm 1009...; svn commit -m "bug 5123: remove some more vestiges of now-obsolete
eval rules, namely the following HTML flags and range variables: attr_seen_*
attr_unique_bad attr_bad html_event_unsafe big_font font_face_caps
font_invisible tiny_font max_shouting text_after_body text_after_html"
lib/Mail/SpamAssassin/HTML.pm
Sending lib/Mail/SpamAssassin/HTML.pm
Transmitting file data .
Committed revision 474920.
it had a nice side-effect of reducing scanner memory usage by 44KB.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] simplify rulesrc implementation
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|major |blocker
Target Milestone|Undefined |3.2.0
------- Additional Comments From jm@jmason.org 2006-10-11 02:53 -------
aiming at 3.2.0 as a blocker; this is something we need to decide before that
release
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] simplify rulesrc implementation
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From jm@jmason.org 2006-10-12 10:04 -------
is the silence indicative of general happiness? Warnock's dilemma strikes ;)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From sa-list@alexb.ch 2006-11-28 11:12 -------
There's abunch of rules which are nciely used in Metas. I don't think its a good
idea to remove them totally. Maybe keep in separate file and score them 0.01
Stuff like HTML_IMAGE_RATIO_06 come in very handy agains stock spams which
combined with *for example* __ANY_OUTLOOK_MUA and some Msg-Id rule.
here's some examples I have in production:
META_OENOBDARY (__ANY_OUTLOOK_MUA && MIME_MISSING_BOUNDARY) �#would stop
working if you remove MIME_MISSING_BOUNDARY
META_ULESSBIZ (BIZ_TLD && !__ANY_OUTLOOK_MUA && __HAS_URI &&
__HAS_X_MAILER!) # would be rendered useless without BIZ_TLD
I doubt I'm the only one who uses stuff like this to add the extra point to a
final score.
If there's any chance of keepigng these rules, I'd work my way thru the list and
a large collection of Metas and check if & what is worth keeping.
Alex
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] should 'rules/72_active.cf' be distributed in the distro tarball?
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From jm@jmason.org 2006-12-13 04:43 -------
I may see if we can resurrect those HTML_IMAGE_RATIO_* rules as meta subrules,
for Alex. Let's keep that a separate issue, in bug 5242.
in the meantime, back to the other issue... I'm going to vote that we should
distribute rules/72_active.cf in the main distro tarball. Basically, otherwise,
when we run the Perceptron at release time, it will only be able to make scores
for the rules in the main tarball, not taken from rulesrc via 72_active.cf.
This would mean that some of the best rules in the ruleset would go unscored.
Until we have a way to Perceptron-optimise rules for published rule updates, we
can't do this.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5123] simplify rulesrc implementation
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5123
------- Additional Comments From jm@jmason.org 2006-10-20 08:22 -------
(In reply to comment #3)
> Sounds like a reasonable plan of action to me.
OK. the low-hanging fruit is now complete -- that's part (1) and (3). next
thing to do is (2) spend some time going through
http://ruleqa.spamassassin.org/20061018-r465178-n , and removing the rules that
are greyed-out, in the core ruleset, and not required for other purposes.
Actually, there may also be a step 4. currently, rules/72_active.cf -- ie. the
compiled "good enough" rules from the rulesrc/sandbox/* tree -- is being
distributed in distro tarballs; however, I think Theo was considering that it
shouldn't be, and instead should only be shipped in sa-update tarballs.
Theo, are you really keen on that idea? I don't think it's too important, since
it will probably work fine the way it is now anyway...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.