You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/25 22:18:21 UTC
[Spamassassin Wiki] Update of "ImproveAccuracy" by Darxus

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The "ImproveAccuracy" page has been changed by Darxus:
https://wiki.apache.org/spamassassin/ImproveAccuracy?action=diff&rev1=12&rev2=13

Comment:
Changing the score for TextCat

  = How to Improve SpamAssassin Accuracy =
- 
  == Run a recent version ==
- Regular updates of SpamAssassin 3.2.x rules stopped in 2008.  Accuracy depends on more recent rules.  Upgrade to 3.3.0 or newer.  
+ Regular updates of SpamAssassin 3.2.x rules stopped in 2008.  Accuracy depends on more recent rules.  Upgrade to 3.3.0 or newer.
  
  == Run sa-update daily ==
  This is often included in SpamAssassin packaging, but sa-update should be run from cron daily, to get the latest SpamAssassin rules which are generated every day.
@@ -11, +10 @@

  (On Debian based systems, set "CRON=1" in /etc/default/spamassassin - this is not the default.)
  
  == Enable network rules ==
- This is the default, but disabling network rules (including DNS rules) causes SpamAssassin to be wrong on about 3 times more emails.  Network tests may have been disabled by running spamassassin or spamd with the command line arguments {{{-L}}} or {{{--local}}}.  DNS rules may have been disabled with "{{{dns_available no}}}" in local.cf.
+ This is the default, but disabling network rules (including DNS rules) causes SpamAssassin to be wrong on about 3 times more emails.  Network tests may have been disabled by running spamassassin or spamd with the command line arguments {{{-L}}} or {{{--local}}}.  DNS rules may have been disabled with "{{{dns_available no}}}" in local.cf. You should run a local caching DNS server for efficiency.
- You should run a local caching DNS server for efficiency.
  
  As of 2011-12-21, without network tests, SpamAssassin is wrong 2.58 times as often on non-spam, and 3.40 times as often on spam.
  
  == Install Pyzor and Razor ==
  These are two helper applications with useful (network) rules.  If they're installed correctly, the debug output of SpamAssassin will include:
+ 
  {{{
  Apr 14 16:24:37.315 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Pyzor from @INC
  Apr 14 16:24:37.318 [4709] dbg: pyzor: network tests on, attempting Pyzor
  Apr 14 16:24:37.318 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Razor2 from @INC
  Apr 14 16:24:37.381 [4709] dbg: razor2: razor2 is available, version 2.84
  }}}
- 
  == Trusted Networks settings ==
  Ensure that internal_networks and trusted_networks are set correctly.  Often, spamassassin will intelligently do the correct thing by default.  But if you're receiving a significant portion of your email via a trusted relay, it needs to be listed in one of these manually, otherwise the wrong hop will be used for things like DNS blacklist tests.  More info at TrustPath.
  
  == Verify AWL and the Bayesian classifier aren't poisoned ==
  The AutoWhitelist, and Bayesian classifier when automatically trained, can get trained incorrectly, resulting in scoring email wrong.  Verify they are providing useful scores - positive scores for spam, and negative scores for ham (AWL and BAYES_* tests).  If they are causing counterproductive scores, the only solutions are to delete the relevant databases and start over training them, or disable them with:
+ 
  {{{
  use_auto_whitelist 0
  use_bayes 0
  }}}
+ To only disable automatic training of the Bayesian classifier:
  
- To only disable automatic training of the Bayesian classifier:
  {{{
  bayes_auto_learn 0
  }}}
+ To remove all existing Bayesian tokens to start training over:
  
- To remove all existing Bayesian tokens to start training over:
  {{{
  rm ~/.spamassassin/bayes_*
  }}}
- 
  == Remove any SARE rules ==
  [[SareChannels|SARE]] rules have not been updated in years, and are therefore actively harmful.  They are not included in !SpamAssassin by default, but often have been added to local configurations.
  
@@ -52, +50 @@

  SoughtRules is a custom rule set generated from spam 4 times a day by a SpamAssassin developer.
  
  == Only accept email in specified languages - TextCat ==
- 
  In /etc/spamassassin/local.'''pre''' add:
  
  {{{loadplugin Mail::SpamAssassin::Plugin::TextCat}}}
@@ -64, +61 @@

  Where "en es" is a list of codes for languages you wish to accept.  The full list is in the [[http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_TextCat.html#ok_languages_xx_yy_zz_default_all|TextCat documentation]].
  
  It is very important that the loadplugin line be added to a .pre file not a .cf file so it is loaded ''before'' the rules files are loaded, otherwise those rules will not get enabled.
+ 
+ You may also want to increase the score from the default of 2.8:
+ 
+ {{{score UNWANTED_LANGUAGE_BODY 5}}}
  
  == Use a local, caching, non-forwarding DNS sever ==
  CachingNameserver.
@@ -80, +81 @@

  
  == Pick a useful threshold ==
  The default threshold is 5, which is used to calculate the scores of all of the tests.  Higher numbers will result in fewer emails considered spam - both reducing false positives, and increasing false-negatives.  Reducing the threshold below 5 is not recommended.  This is configured with:
+ 
  {{{
  required_score 5
  }}}
- 
  == Filtration at your MTA ==
  While outside the scope of SpamAssassin, it is common to do some configuration at your MTA to reject invalid mail.
  
@@ -94, +95 @@

  
  == Find other missing perl modules ==
  You may be able to find other perl modules you can install to help SpamAssassin by running:
+ 
  {{{
  spamassassin -D --lint 2>&1 | grep -i failed
  }}}
- 
  == Writing Rules ==
- 
  WritingRules - when existing tests are not sufficient.
  
  == SpamTips.org setup guide ==