You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spamassassin.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/25 22:18:21 UTC
[Spamassassin Wiki] Update of "ImproveAccuracy" by Darxus
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.
The "ImproveAccuracy" page has been changed by Darxus:
https://wiki.apache.org/spamassassin/ImproveAccuracy?action=diff&rev1=12&rev2=13
Comment:
Changing the score for TextCat
= How to Improve SpamAssassin Accuracy =
-
== Run a recent version ==
- Regular updates of SpamAssassin 3.2.x rules stopped in 2008. Accuracy depends on more recent rules. Upgrade to 3.3.0 or newer.
+ Regular updates of SpamAssassin 3.2.x rules stopped in 2008. Accuracy depends on more recent rules. Upgrade to 3.3.0 or newer.
== Run sa-update daily ==
This is often included in SpamAssassin packaging, but sa-update should be run from cron daily, to get the latest SpamAssassin rules which are generated every day.
@@ -11, +10 @@
(On Debian based systems, set "CRON=1" in /etc/default/spamassassin - this is not the default.)
== Enable network rules ==
- This is the default, but disabling network rules (including DNS rules) causes SpamAssassin to be wrong on about 3 times more emails. Network tests may have been disabled by running spamassassin or spamd with the command line arguments {{{-L}}} or {{{--local}}}. DNS rules may have been disabled with "{{{dns_available no}}}" in local.cf.
+ This is the default, but disabling network rules (including DNS rules) causes SpamAssassin to be wrong on about 3 times more emails. Network tests may have been disabled by running spamassassin or spamd with the command line arguments {{{-L}}} or {{{--local}}}. DNS rules may have been disabled with "{{{dns_available no}}}" in local.cf. You should run a local caching DNS server for efficiency.
- You should run a local caching DNS server for efficiency.
As of 2011-12-21, without network tests, SpamAssassin is wrong 2.58 times as often on non-spam, and 3.40 times as often on spam.
== Install Pyzor and Razor ==
These are two helper applications with useful (network) rules. If they're installed correctly, the debug output of SpamAssassin will include:
+
{{{
Apr 14 16:24:37.315 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Pyzor from @INC
Apr 14 16:24:37.318 [4709] dbg: pyzor: network tests on, attempting Pyzor
Apr 14 16:24:37.318 [4709] dbg: plugin: loading Mail::SpamAssassin::Plugin::Razor2 from @INC
Apr 14 16:24:37.381 [4709] dbg: razor2: razor2 is available, version 2.84
}}}
-
== Trusted Networks settings ==
Ensure that internal_networks and trusted_networks are set correctly. Often, spamassassin will intelligently do the correct thing by default. But if you're receiving a significant portion of your email via a trusted relay, it needs to be listed in one of these manually, otherwise the wrong hop will be used for things like DNS blacklist tests. More info at TrustPath.
== Verify AWL and the Bayesian classifier aren't poisoned ==
The AutoWhitelist, and Bayesian classifier when automatically trained, can get trained incorrectly, resulting in scoring email wrong. Verify they are providing useful scores - positive scores for spam, and negative scores for ham (AWL and BAYES_* tests). If they are causing counterproductive scores, the only solutions are to delete the relevant databases and start over training them, or disable them with:
+
{{{
use_auto_whitelist 0
use_bayes 0
}}}
+ To only disable automatic training of the Bayesian classifier:
- To only disable automatic training of the Bayesian classifier:
{{{
bayes_auto_learn 0
}}}
+ To remove all existing Bayesian tokens to start training over:
- To remove all existing Bayesian tokens to start training over:
{{{
rm ~/.spamassassin/bayes_*
}}}
-
== Remove any SARE rules ==
[[SareChannels|SARE]] rules have not been updated in years, and are therefore actively harmful. They are not included in !SpamAssassin by default, but often have been added to local configurations.
@@ -52, +50 @@
SoughtRules is a custom rule set generated from spam 4 times a day by a SpamAssassin developer.
== Only accept email in specified languages - TextCat ==
-
In /etc/spamassassin/local.'''pre''' add:
{{{loadplugin Mail::SpamAssassin::Plugin::TextCat}}}
@@ -64, +61 @@
Where "en es" is a list of codes for languages you wish to accept. The full list is in the [[http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_TextCat.html#ok_languages_xx_yy_zz_default_all|TextCat documentation]].
It is very important that the loadplugin line be added to a .pre file not a .cf file so it is loaded ''before'' the rules files are loaded, otherwise those rules will not get enabled.
+
+ You may also want to increase the score from the default of 2.8:
+
+ {{{score UNWANTED_LANGUAGE_BODY 5}}}
== Use a local, caching, non-forwarding DNS sever ==
CachingNameserver.
@@ -80, +81 @@
== Pick a useful threshold ==
The default threshold is 5, which is used to calculate the scores of all of the tests. Higher numbers will result in fewer emails considered spam - both reducing false positives, and increasing false-negatives. Reducing the threshold below 5 is not recommended. This is configured with:
+
{{{
required_score 5
}}}
-
== Filtration at your MTA ==
While outside the scope of SpamAssassin, it is common to do some configuration at your MTA to reject invalid mail.
@@ -94, +95 @@
== Find other missing perl modules ==
You may be able to find other perl modules you can install to help SpamAssassin by running:
+
{{{
spamassassin -D --lint 2>&1 | grep -i failed
}}}
-
== Writing Rules ==
-
WritingRules - when existing tests are not sufficient.
== SpamTips.org setup guide ==