You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kris Deugau <kd...@vianet.ca> on 2010/04/30 22:46:47 UTC

Re: Poisoning my own Ham // SpamAssassin won't implement local config changes

Bryan Lee wrote:
> The setup:
> Ubuntu 9.04 Jaunty system running postfix, amavis, clamav, and
> spamassassin.

> Problem 1)
> 	SpamAssassin won't implement local config changes.

> I HAVE reloaded spamassassin afterwards:
> 	> sudo /etc/init.d/spamassassin reload
> 	Reloading SpamAssassin Mail Filter Daemon: spamd.

If you're using Amavis, you're almost certainly not using spamd.  Amavis 
typically loads the SA Perl libraries directly, so if you change your SA 
config you need to restart Amavis.

> Problem 2)
> Setup is above.
> 
> In front of these mail gateways is a load balancer.  Because of this,
> ALL incoming messages appear to be coming from the load balancer, and
> therefore a server on my inside (and trusted) network.
> I have removed my trusted network from the amavis "mynetworks" config,
> but SpamAssassin still thinks it's trusted, hence the changes attempted
> above.

Eww.  I know it's *possible* to configure a load-balanced system to pass 
the connecting IP through to the individual hosts in the cluster, 
because the core mail systems here all sit behind a Linux-based load 
balancer and app-level IP checks (eg Postfix reject on Spamhaus Zen) 
work fine.  Check your config or call the vendor for support.

If fixing the LB isn't any option for some reason, you might be able to 
sort of work around things by adding the LB as a trusted IP. 
Unfortunately, if it doesn't add any header data, you're stuck because 
the info about the real remote IP has been lost.  DNSBLs will be of 
limited use, if any, in that case.  :(  The AWL will also be pretty much 
useless since all of your mail appears to come from the same IP.

You might also want to try dropping the LB and just running both 
machines as equal-priority MXes.

> Because a large number of messages were tagged as trusted and let
> through, and because autolearning is turned on, Bayes is learning these
> messages incorrectly as ham!  (Poisoned ham.)
> 
> I AM able to run 
> 	sa-learn --clear
> to clear the database, but BEFORE that and now I get
> 	sa-learn --dump
> 	config: path "/home/blee/.spamassassin/user_prefs" is
> inaccessible: Permission denied
> 	ERROR: Bayes dump returned an error, please re-run with -D for
> more information

Check ownership and permissions, and make sure they match up with the 
user Amavis is running as.  Make sure you're running these commands in 
the shell as that user, too.

Supposedly it's possible to do some per-user Bayes magic with direct 
library-callers like Amavis, but most require severe hackery (often to 
get the caller to use spamc/spamd).

> 	[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
> /home/blee/.spamassassin/bayes_toks

> *  How do I find out what is currently in my Bayes database or if it
> even exists?

ls -l /home/blee/.spamassassin/, and see what files show up.  To truly 
start fresh, you may want to clear the DB by actually deleting the 
bayes_* files instead of using "sa-learn --clear".

> To make this even more complicated, mail headers are added after
> SpamAssassin and then the proprietary custom mail system chops messages
> up into parts.  Part .0 is the headers, part.1 is the main body, and
> parts .2 - .999 are attachments.  SO I don't even have access to the
> original email in it's entirety!

How badly mangled are the various parts?  If they're "just" split up, it 
should be possible to reconstruct a message that's enough like the 
original to not matter to SA.

> *  Is there any way to make Bayes relearn or even just unlearn a message
> based on a MessageID or something else?

Not really, no.  The MessageId is just a flag that says "I've learned 
from this message";  once a message has been through there's no link 
between the MessageId's and the tokens from that message.

> *  Will it be useful to just feed the message bodies to sa-learn as
> --spam without the original headers and mime seperetors?

Possibly, but without a MessageId it'll likely get messy.  Even taking 
the headers (minus any MIME multipart references) plus the first body 
component is probably better than a bare body.

-kgd