You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bryan Lee <Br...@Gosolutions.com> on 2010/04/30 22:02:14 UTC
Poisoning my own Ham // SpamAssassin won't implement local config changes
The setup:
Ubuntu 9.04 Jaunty system running postfix, amavis, clamav, and
spamassassin.
These systems(2) serve as relay and filtering gateways into a
proprietary custom mail system.
Problem 1)
SpamAssassin won't implement local config changes.
I have tried making changes to /etc/spamassassin/local.cf and creating
/etc/spamassassin/custom_rule.cf but new rules in either are ignored.
Configuration settings in both are also appear to be ignored.
I have run:
sudo spamassassin -D --lint 2>&1 |less
And the output contains:
[29713] dbg: config: using "/etc/spamassassin" for site rules
dir
[29713] dbg: config: read file /etc/spamassassin/65_debian.cf
[29713] dbg: config: read file /etc/spamassassin/custom_rule.cf
[29713] dbg: config: read file /etc/spamassassin/local.cf
I HAVE reloaded spamassassin afterwards:
> sudo /etc/init.d/spamassassin reload
Reloading SpamAssassin Mail Filter Daemon: spamd.
Changes I've made to /etc/spamassassin/local.cf include:
# To exclude my local networks (see below)
internal_networks !0/0
# To turn off auto learning
bayes_auto_learn 0
# To ignore soft whitelisiting of trusted hosts
score ALL_TRUSTED 0.0
# To give a high rating to messages with this subject, a test
header LOCAL_EVERYTHINGYOUNEED_RULE Subject=~ /Everything you
need, you can find here/i
score LOCAL_EVERYTHINGYOUNEED_RULE 3.0
describe LOCAL_EVERYTHINGYOUNEED_RULE Everything you need
My /etc/spamassassin/custom_rule.cf contains the same "Everything you
need" lines as above.
Results -- Messages containing "Everything you need, you can find here"
gets through with these headers added:
X-Spam-Flag: NO
X-Spam-Score: -1.44
X-Spam-Score: -3.037
X-Spam-Level:
X-Spam-Status: No, score=-3.037 tagged_above=-9999 required=4
tests=[ALL_TRUSTED=-1.8, AWL=1.362, BAYES_00=-2.599] autolearn=ham
(Please ignore the fact that ALL_TRUSTED and BAYES_00 are subtracting
from the score for now, it's explained in problem 2. Here, the
LOCAL_EVERYTHINGYOUNEED_RULE doesn't even show up.)
Note that ONE of the X-Spam-Level is generated by a second instance of
SpamAssassin running on the proprietary custom mail system, BUT, the
values in the X-Spam-Status ARE being generated by my gateway.
* Does anyone have any ideas why my local.cf and custom_rule.cf appear
to be ignored, despite showing up when --linting?
----------
Problem 2)
Setup is above.
In front of these mail gateways is a load balancer. Because of this,
ALL incoming messages appear to be coming from the load balancer, and
therefore a server on my inside (and trusted) network.
I have removed my trusted network from the amavis "mynetworks" config,
but SpamAssassin still thinks it's trusted, hence the changes attempted
above.
Because a large number of messages were tagged as trusted and let
through, and because autolearning is turned on, Bayes is learning these
messages incorrectly as ham! (Poisoned ham.)
I AM able to run
sa-learn --clear
to clear the database, but BEFORE that and now I get
sa-learn --dump
config: path "/home/blee/.spamassassin/user_prefs" is
inaccessible: Permission denied
ERROR: Bayes dump returned an error, please re-run with -D for
more information
With -D
sa-learn --dump -D
We get the following lines that I think are of interest:
...
[30577] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes
from @INC
...
[30577] dbg: conf: finish parsing
[30577] dbg: plugin:
Mail::SpamAssassin::Plugin::ReplaceTags=HASH(0x90fc5f8) implements
'finish_parsing_end', priority 0
[30577] dbg: replacetags: replacing tags
[30577] dbg: replacetags: done replacing tags
[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
/home/blee/.spamassassin/bayes_toks
[30577] dbg: config: score set 1 chosen.
[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
/home/blee/.spamassassin/bayes_toks
ERROR: Bayes dump returned an error, please re-run with -D for
more information
* How do I find out what is currently in my Bayes database or if it
even exists?
To make this even more complicated, mail headers are added after
SpamAssassin and then the proprietary custom mail system chops messages
up into parts. Part .0 is the headers, part.1 is the main body, and
parts .2 - .999 are attachments. SO I don't even have access to the
original email in it's entirety!
* Is there any way to make Bayes relearn or even just unlearn a message
based on a MessageID or something else?
* Will it be useful to just feed the message bodies to sa-learn as
--spam without the original headers and mime seperetors?
I've been banging my head on the wall about this for 2 days, so any help
will be greatly appreciated.
--Bryan
Re: Poisoning my own Ham // SpamAssassin won't implement local config
changes
Posted by Kris Deugau <kd...@vianet.ca>.
Bryan Lee wrote:
> The setup:
> Ubuntu 9.04 Jaunty system running postfix, amavis, clamav, and
> spamassassin.
> Problem 1)
> SpamAssassin won't implement local config changes.
> I HAVE reloaded spamassassin afterwards:
> > sudo /etc/init.d/spamassassin reload
> Reloading SpamAssassin Mail Filter Daemon: spamd.
If you're using Amavis, you're almost certainly not using spamd. Amavis
typically loads the SA Perl libraries directly, so if you change your SA
config you need to restart Amavis.
> Problem 2)
> Setup is above.
>
> In front of these mail gateways is a load balancer. Because of this,
> ALL incoming messages appear to be coming from the load balancer, and
> therefore a server on my inside (and trusted) network.
> I have removed my trusted network from the amavis "mynetworks" config,
> but SpamAssassin still thinks it's trusted, hence the changes attempted
> above.
Eww. I know it's *possible* to configure a load-balanced system to pass
the connecting IP through to the individual hosts in the cluster,
because the core mail systems here all sit behind a Linux-based load
balancer and app-level IP checks (eg Postfix reject on Spamhaus Zen)
work fine. Check your config or call the vendor for support.
If fixing the LB isn't any option for some reason, you might be able to
sort of work around things by adding the LB as a trusted IP.
Unfortunately, if it doesn't add any header data, you're stuck because
the info about the real remote IP has been lost. DNSBLs will be of
limited use, if any, in that case. :( The AWL will also be pretty much
useless since all of your mail appears to come from the same IP.
You might also want to try dropping the LB and just running both
machines as equal-priority MXes.
> Because a large number of messages were tagged as trusted and let
> through, and because autolearning is turned on, Bayes is learning these
> messages incorrectly as ham! (Poisoned ham.)
>
> I AM able to run
> sa-learn --clear
> to clear the database, but BEFORE that and now I get
> sa-learn --dump
> config: path "/home/blee/.spamassassin/user_prefs" is
> inaccessible: Permission denied
> ERROR: Bayes dump returned an error, please re-run with -D for
> more information
Check ownership and permissions, and make sure they match up with the
user Amavis is running as. Make sure you're running these commands in
the shell as that user, too.
Supposedly it's possible to do some per-user Bayes magic with direct
library-callers like Amavis, but most require severe hackery (often to
get the caller to use spamc/spamd).
> [30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
> /home/blee/.spamassassin/bayes_toks
> * How do I find out what is currently in my Bayes database or if it
> even exists?
ls -l /home/blee/.spamassassin/, and see what files show up. To truly
start fresh, you may want to clear the DB by actually deleting the
bayes_* files instead of using "sa-learn --clear".
> To make this even more complicated, mail headers are added after
> SpamAssassin and then the proprietary custom mail system chops messages
> up into parts. Part .0 is the headers, part.1 is the main body, and
> parts .2 - .999 are attachments. SO I don't even have access to the
> original email in it's entirety!
How badly mangled are the various parts? If they're "just" split up, it
should be possible to reconstruct a message that's enough like the
original to not matter to SA.
> * Is there any way to make Bayes relearn or even just unlearn a message
> based on a MessageID or something else?
Not really, no. The MessageId is just a flag that says "I've learned
from this message"; once a message has been through there's no link
between the MessageId's and the tokens from that message.
> * Will it be useful to just feed the message bodies to sa-learn as
> --spam without the original headers and mime seperetors?
Possibly, but without a MessageId it'll likely get messy. Even taking
the headers (minus any MIME multipart references) plus the first body
component is probably better than a bare body.
-kgd