You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by co...@spamassassin.apache.org on 2004/11/18 01:42:41 UTC

[SpamAssassin Wiki] New: ReleaseGoals

   Date: 2004-11-17T16:42:39
   Editor: DanielQuinlan <qu...@pathname.com>
   Wiki: SpamAssassin Wiki
   Page: ReleaseGoals
   URL: http://wiki.apache.org/spamassassin/ReleaseGoals

   new developer page

New Page:

#pragma section-numbers off

== Overview ==

 * lower resource usage: higher throughput and lower memory usage
 * higher accuracy: lower FPs and lower FNs (rules, rules, rules... this also includes some notion of speeding up the mass-check process)
 * convert optional/non-performance-sensitive code to plugins (I think this is lower priority, but we've often talked about it and it also helps achieve the first goal of lower resource usage)

== anti-goals ==

 * features: extra options, non-critical changes not related to the above goals, etc. (except perhaps in plugins)
 * option bloat (except perhaps in plugins)

== Memory Usage ==

We should probably evolve some understanding of what we want to convert
to plugins.  Here's the list mostly based on conversations with Theo,
Justin, and Michael:

 * Razor
 * DCC
 * Pyzor
 * SpamCop reporting
 * nuke AWL and replace with "History" plugin
 * TextCat

== Performance/Speed ==

 * Predictive autolearn?  do check before bayes_check, if we are likely to autolearn, go r/w instead of r/o.  Can implement on first bayes_check call.
 * Don't bother caching full/decoded/etc at start in PMS.  how much caching do we do now?  multiple times in PMS?  may not be an issue due to references.
 * short circuiting ideas:
        * set certain rules as SC if hit
          *     USER_IN_WHITELIST, USER_IN_BLACKLIST (not DEF)
          *     BSP
          *     HABEAS
        * allow SC on ham score (ie: < #)
        * allow SC on spam score (ie: > #)
        * should autolearn skip SC msgs?  should we always do autolearn in the appropriate direction?
        * AWL should be skipped during SC
        * SC rules should have a negative priority so they run first
        * do *not* do score check per rule, do it either per priority or rule type (header, body, etc.)
        * SC will require is_spam SC as score + required_hits will be at odds
        * add SC header macro (get_tag)
        * SC for S/O 1.000 rules?  how about S/O near 1?  BAYES_99, etc.
        * Some form of order/priority rearrangement:
          * Blacklist:               short
          * Whitelist:               user/admin wants it
          * BSP/Habeas:              reputable, non-forgable
          * Other SC Rules:          as early as possible
          * Other Local Rules:       lightweight

== Speed Release Cycle ==

 * Single-cycle mass-check
   * Add sample-based "autolearning" to mass-check
   * One run with network and bayes turned on
   * Related, but non-required change to autolearning: the balancing of in and out (accuracy)

== Accuracy Ideas ==

 * network test, do DNS lookups on the HELO (A, NS, and SURBL)
 * network test, do DNS lookups on the EnvelopeFrom (SURBL)

== Uncategorized ==