You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by co...@spamassassin.apache.org on 2004/11/18 01:42:41 UTC
[SpamAssassin Wiki] New: ReleaseGoals
Date: 2004-11-17T16:42:39
Editor: DanielQuinlan <qu...@pathname.com>
Wiki: SpamAssassin Wiki
Page: ReleaseGoals
URL: http://wiki.apache.org/spamassassin/ReleaseGoals
new developer page
New Page:
#pragma section-numbers off
== Overview ==
* lower resource usage: higher throughput and lower memory usage
* higher accuracy: lower FPs and lower FNs (rules, rules, rules... this also includes some notion of speeding up the mass-check process)
* convert optional/non-performance-sensitive code to plugins (I think this is lower priority, but we've often talked about it and it also helps achieve the first goal of lower resource usage)
== anti-goals ==
* features: extra options, non-critical changes not related to the above goals, etc. (except perhaps in plugins)
* option bloat (except perhaps in plugins)
== Memory Usage ==
We should probably evolve some understanding of what we want to convert
to plugins. Here's the list mostly based on conversations with Theo,
Justin, and Michael:
* Razor
* DCC
* Pyzor
* SpamCop reporting
* nuke AWL and replace with "History" plugin
* TextCat
== Performance/Speed ==
* Predictive autolearn? do check before bayes_check, if we are likely to autolearn, go r/w instead of r/o. Can implement on first bayes_check call.
* Don't bother caching full/decoded/etc at start in PMS. how much caching do we do now? multiple times in PMS? may not be an issue due to references.
* short circuiting ideas:
* set certain rules as SC if hit
* USER_IN_WHITELIST, USER_IN_BLACKLIST (not DEF)
* BSP
* HABEAS
* allow SC on ham score (ie: < #)
* allow SC on spam score (ie: > #)
* should autolearn skip SC msgs? should we always do autolearn in the appropriate direction?
* AWL should be skipped during SC
* SC rules should have a negative priority so they run first
* do *not* do score check per rule, do it either per priority or rule type (header, body, etc.)
* SC will require is_spam SC as score + required_hits will be at odds
* add SC header macro (get_tag)
* SC for S/O 1.000 rules? how about S/O near 1? BAYES_99, etc.
* Some form of order/priority rearrangement:
* Blacklist: short
* Whitelist: user/admin wants it
* BSP/Habeas: reputable, non-forgable
* Other SC Rules: as early as possible
* Other Local Rules: lightweight
== Speed Release Cycle ==
* Single-cycle mass-check
* Add sample-based "autolearning" to mass-check
* One run with network and bayes turned on
* Related, but non-required change to autolearning: the balancing of in and out (accuracy)
== Accuracy Ideas ==
* network test, do DNS lookups on the HELO (A, NS, and SURBL)
* network test, do DNS lookups on the EnvelopeFrom (SURBL)
== Uncategorized ==