You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bryan Lee <Br...@Gosolutions.com> on 2010/04/30 22:02:14 UTC

Poisoning my own Ham // SpamAssassin won't implement local config changes

The setup:
Ubuntu 9.04 Jaunty system running postfix, amavis, clamav, and
spamassassin.
These systems(2) serve as relay and filtering gateways into a
proprietary custom mail system.

Problem 1)
	SpamAssassin won't implement local config changes.

I have tried making changes to /etc/spamassassin/local.cf and creating
/etc/spamassassin/custom_rule.cf but new rules in either are ignored.
Configuration settings in both are also appear to be ignored.

I have run:
	sudo spamassassin -D --lint 2>&1 |less
And the output contains:
	[29713] dbg: config: using "/etc/spamassassin" for site rules
dir
	[29713] dbg: config: read file /etc/spamassassin/65_debian.cf
	[29713] dbg: config: read file /etc/spamassassin/custom_rule.cf
	[29713] dbg: config: read file /etc/spamassassin/local.cf

I HAVE reloaded spamassassin afterwards:
	> sudo /etc/init.d/spamassassin reload
	Reloading SpamAssassin Mail Filter Daemon: spamd.
	
Changes I've made to /etc/spamassassin/local.cf include:

# To exclude my local networks (see below)
	internal_networks !0/0
# To turn off auto learning
	bayes_auto_learn 0
# To ignore soft whitelisiting of trusted hosts
	score ALL_TRUSTED 0.0
# To give a high rating to messages with this subject, a test
	header LOCAL_EVERYTHINGYOUNEED_RULE Subject=~ /Everything you
need, you can find here/i
	score LOCAL_EVERYTHINGYOUNEED_RULE 3.0
	describe LOCAL_EVERYTHINGYOUNEED_RULE Everything you need

My /etc/spamassassin/custom_rule.cf contains the same "Everything you
need" lines as above.

Results --  Messages containing "Everything you need, you can find here"
gets through with these headers added:
	X-Spam-Flag: NO
	X-Spam-Score: -1.44
	X-Spam-Score: -3.037
	X-Spam-Level:
	X-Spam-Status: No, score=-3.037 tagged_above=-9999 required=4
tests=[ALL_TRUSTED=-1.8, AWL=1.362, BAYES_00=-2.599] autolearn=ham

(Please ignore the fact that ALL_TRUSTED and BAYES_00 are subtracting
from the score for now, it's explained in problem 2.  Here, the
LOCAL_EVERYTHINGYOUNEED_RULE doesn't even show up.)

Note that ONE of the X-Spam-Level is generated by a second instance of
SpamAssassin running on the proprietary custom mail system, BUT, the
values in the X-Spam-Status ARE being generated by my gateway.


*  Does anyone have any ideas why my local.cf and custom_rule.cf appear
to be ignored, despite showing up when --linting?


----------

Problem 2)
Setup is above.

In front of these mail gateways is a load balancer.  Because of this,
ALL incoming messages appear to be coming from the load balancer, and
therefore a server on my inside (and trusted) network.
I have removed my trusted network from the amavis "mynetworks" config,
but SpamAssassin still thinks it's trusted, hence the changes attempted
above.

Because a large number of messages were tagged as trusted and let
through, and because autolearning is turned on, Bayes is learning these
messages incorrectly as ham!  (Poisoned ham.)

I AM able to run 
	sa-learn --clear
to clear the database, but BEFORE that and now I get
	sa-learn --dump
	config: path "/home/blee/.spamassassin/user_prefs" is
inaccessible: Permission denied
	ERROR: Bayes dump returned an error, please re-run with -D for
more information

With -D
	sa-learn --dump -D 
We get the following lines that I think are of interest:
		...
	[30577] dbg: plugin: loading Mail::SpamAssassin::Plugin::Bayes
from @INC
		...
	[30577] dbg: conf: finish parsing
	[30577] dbg: plugin:
Mail::SpamAssassin::Plugin::ReplaceTags=HASH(0x90fc5f8) implements
'finish_parsing_end', priority 0
	[30577] dbg: replacetags: replacing tags
	[30577] dbg: replacetags: done replacing tags
	[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
/home/blee/.spamassassin/bayes_toks
	[30577] dbg: config: score set 1 chosen.
	[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
/home/blee/.spamassassin/bayes_toks
	ERROR: Bayes dump returned an error, please re-run with -D for
more information


*  How do I find out what is currently in my Bayes database or if it
even exists?



To make this even more complicated, mail headers are added after
SpamAssassin and then the proprietary custom mail system chops messages
up into parts.  Part .0 is the headers, part.1 is the main body, and
parts .2 - .999 are attachments.  SO I don't even have access to the
original email in it's entirety!

*  Is there any way to make Bayes relearn or even just unlearn a message
based on a MessageID or something else?
*  Will it be useful to just feed the message bodies to sa-learn as
--spam without the original headers and mime seperetors?


I've been banging my head on the wall about this for 2 days, so any help
will be greatly appreciated.

--Bryan

Re: Poisoning my own Ham // SpamAssassin won't implement local config changes

Posted by Kris Deugau <kd...@vianet.ca>.
Bryan Lee wrote:
> The setup:
> Ubuntu 9.04 Jaunty system running postfix, amavis, clamav, and
> spamassassin.

> Problem 1)
> 	SpamAssassin won't implement local config changes.

> I HAVE reloaded spamassassin afterwards:
> 	> sudo /etc/init.d/spamassassin reload
> 	Reloading SpamAssassin Mail Filter Daemon: spamd.

If you're using Amavis, you're almost certainly not using spamd.  Amavis 
typically loads the SA Perl libraries directly, so if you change your SA 
config you need to restart Amavis.

> Problem 2)
> Setup is above.
> 
> In front of these mail gateways is a load balancer.  Because of this,
> ALL incoming messages appear to be coming from the load balancer, and
> therefore a server on my inside (and trusted) network.
> I have removed my trusted network from the amavis "mynetworks" config,
> but SpamAssassin still thinks it's trusted, hence the changes attempted
> above.

Eww.  I know it's *possible* to configure a load-balanced system to pass 
the connecting IP through to the individual hosts in the cluster, 
because the core mail systems here all sit behind a Linux-based load 
balancer and app-level IP checks (eg Postfix reject on Spamhaus Zen) 
work fine.  Check your config or call the vendor for support.

If fixing the LB isn't any option for some reason, you might be able to 
sort of work around things by adding the LB as a trusted IP. 
Unfortunately, if it doesn't add any header data, you're stuck because 
the info about the real remote IP has been lost.  DNSBLs will be of 
limited use, if any, in that case.  :(  The AWL will also be pretty much 
useless since all of your mail appears to come from the same IP.

You might also want to try dropping the LB and just running both 
machines as equal-priority MXes.

> Because a large number of messages were tagged as trusted and let
> through, and because autolearning is turned on, Bayes is learning these
> messages incorrectly as ham!  (Poisoned ham.)
> 
> I AM able to run 
> 	sa-learn --clear
> to clear the database, but BEFORE that and now I get
> 	sa-learn --dump
> 	config: path "/home/blee/.spamassassin/user_prefs" is
> inaccessible: Permission denied
> 	ERROR: Bayes dump returned an error, please re-run with -D for
> more information

Check ownership and permissions, and make sure they match up with the 
user Amavis is running as.  Make sure you're running these commands in 
the shell as that user, too.

Supposedly it's possible to do some per-user Bayes magic with direct 
library-callers like Amavis, but most require severe hackery (often to 
get the caller to use spamc/spamd).

> 	[30577] dbg: Bayes: no dbs present, cannot tie DB R/O:
> /home/blee/.spamassassin/bayes_toks

> *  How do I find out what is currently in my Bayes database or if it
> even exists?

ls -l /home/blee/.spamassassin/, and see what files show up.  To truly 
start fresh, you may want to clear the DB by actually deleting the 
bayes_* files instead of using "sa-learn --clear".

> To make this even more complicated, mail headers are added after
> SpamAssassin and then the proprietary custom mail system chops messages
> up into parts.  Part .0 is the headers, part.1 is the main body, and
> parts .2 - .999 are attachments.  SO I don't even have access to the
> original email in it's entirety!

How badly mangled are the various parts?  If they're "just" split up, it 
should be possible to reconstruct a message that's enough like the 
original to not matter to SA.

> *  Is there any way to make Bayes relearn or even just unlearn a message
> based on a MessageID or something else?

Not really, no.  The MessageId is just a flag that says "I've learned 
from this message";  once a message has been through there's no link 
between the MessageId's and the tokens from that message.

> *  Will it be useful to just feed the message bodies to sa-learn as
> --spam without the original headers and mime seperetors?

Possibly, but without a MessageId it'll likely get messy.  Even taking 
the headers (minus any MIME multipart references) plus the first body 
component is probably better than a bare body.

-kgd