You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/08/25 02:09:28 UTC

daily updates

Okay, here's my proposal for official daily updates for 3.0:

Basic design:

  - updates are a set of files that are used to supplement official
    releases beginning with 3.0.0 -- initially, these will be .cf files
    beginning with a number from 80 to 89.
  - updates are published for each individual release: 3.0.0, 3.0.1, etc.
  - it is expected that most updates will be equivalent across a
    particular set of releases using the same rule base (3.0.x will all be
    linked, etc.)
  - updates will be pulled or pushed (TBD) about once a day initially 
  - all updates will be tested using a nightly corpus test system and
    will eventually be scored using the perceptron

New rules:

  - primary rule sources:
     * 70_testing.cf rules from HEAD
     * user submissions
  - target missed spam
  - checked into SVN regularly
  - an SVN revision is only released as an update after passing a QA
    process including: lint, positive corpus results, and a visual
    inspection by a developer - actual updates will probably not be
    daily at all initially, but daily is the target

Update software:

  - worry about distribution last, worry about generation and
    maintenance of rules first
  - possible protocols: rsync and http
  - apache.org mirror system?
  - daemon vs. cron job
  - pull vs. push

Daniel

Re: daily updates

Posted by Daniel Quinlan <qu...@pathname.com>.
Daniel Quinlan wrote:

>> Okay, here's my proposal for official daily updates for 3.0:

Fred  <te...@i-is.com> writes:

> Would be possible to include the work of SARE in this process.  The use of
> nightly corpus test system would help validate if our rules are good,
> perceptron could assign scores, we're still active just not very noisy.

It's definitely possible for some (most?) SARE work:

 - those rule sets we *can* use (clear authorship and the ASF needs
   Contributor License Agreements)

 - this would probably focus on non-heavyweight, cleanly designed,
   high-accuracy, etc. rule sets

We use SARE ideas *quite* often now, but we use SARE code (rules) less
often, unfortunately.  That's usually because it takes a lot of time to
rewrite rules when we must because we don't have permission to use them
or for technical reasons.  Only a few of our current committers do the
vast majority of rule development and integration, so we are a bit
resource constrained.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: daily updates

Posted by Fred <te...@i-is.com>.
Daniel Quinlan wrote:
> Okay, here's my proposal for official daily updates for 3.0:

Would be possible to include the work of SARE in this process.  The use of
nightly corpus test system would help validate if our rules are good,
perceptron could assign scores, we're still active just not very noisy.