You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2005/10/06 03:34:17 UTC

[Spamassassin Wiki] Update of "RulesProjPromotion" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RulesProjPromotion

The comment on the change is:
update to match current plans

------------------------------------------------------------------------------
  
  == Moving files out of trunk into the new rules project ==
  
+ Code-tied rules stay with main tree in current rules directory, with the
+ exception of 25_replace.cf which is really just another way to write
+ body/header rules.   Basically, the static stuff that is tied to code does not
+ move to the rules project.  Everything else moves.
- JustinMason: If we're going to start pulling rules from sandboxes into core/ in
- the above fashion, but we leave the current ruleset intact in the
- core as well, things will get messy.
- I propose we move the current core ruleset into a sandbox, called
- 'rules/sandbox/legacy/'.  The good rules that pass the above selection
- criteria, get promoted as any other rules from other sandboxes do, into the new
- 'core/'; the old, stale rules (of which we have a few), will not get back into
- core.
- 
- DanielQuinlan: vetoed.  Instead: code-tied rules stay with main tree in current
- rules directory, with the exception of 25_replace.cf which is really just
- another way to write body/header rules.   Basically, the static stuff that is
- tied to code does not move to the rules project.
  
  In more detail -- files that DO NOT move to rules project:
  
@@ -107, +98 @@

     60_whitelist_spf.cf  -> ROOT/rules/trunk/core/
  }}}
  
- Files that get deleted: 20_anti_ratware.cf: it's empty.  [DONE]
+ == Algorithm for compilation ==
  
- JustinMason: ok, that looks good -- except for one thing.  We still have the problem that ROOT/rules/trunk/core/ is going to be a mix of legacy files and auto-promoted rules.  What do we do about that problem?
+ The {{{ROOT/rules/trunk}}} svn path is now the rules source directory.
  
- DanielQuinlan: the auto-promoted .cf file should be 100% machine generated and overwritten each night (or whatever the period is).   Once a rule is promoted into core, it'll disappear from the auto-promoted file because (a) overlap test dictates so or (b) the non-core file that contained the file will no longer contain it (or we could use a comment, rename the rule, etc. to indicate that it is no longer a candidate for auto-promotion if the author wants to keep it around).
+ The {{{ROOT/trunk/rules}}} svn path -- ie "rules" in the SpamAssassin source tree -- is the
+ rules build output directory.
  
- JustinMason: update -- here's the script that will be run to perform these renames:
- http://taint.org/xfer/2005/svnrenames
+ Rules are compiled from source dir to output dir.   All rules in "core" are
+ always promoted (for backwards compatibility).  In addition, rules in the
+ sandboxes will be promoted, if the rules source file contains a {{{publish
+ core}}} command. This command is added (by hand!) to the source file by
+ committers, as the rules pass the validation criteria.
  
+ Rules will be autorenamed, if there's a collision between a new rule name and one that's already been output by the compiler. 
  
- == Algorithm for auto-promotion ==
+ (TODO: autorenaming algorithms.  currently thinking of appending the sandbox filename, sanitized)
  
- JustinMason: Aside from the criteria, we also need an idea of how the config file lines get from sandbox to core.  Here's my proposal.
+ The compiler will copy the rules to the output directory.  By default, the
+ filename is preserved; so a rule in a file called "20_foo.cf" in the source
+ directory will be output to the file "20_foo.cf".
  
+ (TODO: 'pubfile' is another command to select the name of the output file in
+ the "rules" directory: {{{pubfile NN_filename.cf}}} , and override that
+ behaviour.)
- For each sandbox directory:
-  * iterate through all files in the dir
-  * if a config line refers to a rule name (e.g. "header", "describe", "tflags"), then:
-    * apply the validation criteria.  if the rule passes:
-      * output the line
-    * else:
-      * ignore the line and produce no output
-  * if the config line doesn't refer to a rule name, output the line.
-  * send that output to a file in ROOT/rules/trunk/core/ , named according to the sandbox directory's name.  e.g. lines from all files matching ROOT/rules/trunk/sandbox/jmason/*.cf would be output to ROOT/rules/trunk/core/25_jmason.cf
  
- DanielQuinlan: we'll need to work on the naming
+ (TODO: linting during compilation, and ignore lint-failures?  may have to reimplement a small subset of lint behaviour to do this.)
  
- === The validation criteria ===
- 
- So, initially, I had this marked as "the criteria from Rule Promotion", above.  Hwoever, that didn't make sense; one aim of having a 'compiler' for this stuff was to avoid "flapping" when rules would pass criteria one day and fail the next, falling into and out of the distributable ruleset.  This would happen using those criteria, as they're FP%/FN%-based.
- 
- On review, this isn't what we'd initially discussed on IRC, and didn't make sense; I'd oversimplified during transcribing.
- 
- Instead the plan we'd agreed was to compile the rules files from the source dir to the output dir, and select rules which were marked as "promoted" in their source files.
- 
- The mark in question is through a build command in the source file, something like:
- 
- {{{
-     publish 1
- }}}
- 
- (suggestions welcome...)
- 
- other build commands include:
- 
-  * a command to select the name of the output file in the 'rules' output directory
-