You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Micah Anderson <mi...@riseup.net> on 2009/06/05 16:24:31 UTC

Bayes learning trusted networks mailing list email

I get a significant amount of spam that comes through mailing lists that
I am legitimately subscribed to, either they are the administration
emails asking me if I want to approve the "email" or not, or they are
messages that make it through the list.

These messages are either hitting ALL_TRUSTED, because they come from
mailing lists on my networks, or are tagged with a clear
untrusted-relays list. In otherwords, I've got my trusted_networks setup
so that SA knows about networks that I trust to be sending legitimate
email (they are not spam originators), but obviously spam gets through,
but the spam comes from hops previous to these networks. If I understand
things properly, because I've got these setup in my trusted_networks,
then these previous hops will be checked in RBLs, so the spam is more
detectable. For example, the debian servers do send some spam to me, but
the Received: headers in the emails are correct, so if the server's
address is in trusted_networks, then SA will look up the address debian
got the email from in RBLs.  

What I am unsure of is if I am poisoning my bayes by reporting these
messages that make it through as spam. Should I be just deleting them?
The tokens that are legitimate that will end up as collateral damage are
going to be the list footers, the list administration messages, and
potentially other pieces.

I'm hoping I can identify why my bayes database is so bad (it thinks
everything is BAYES_00 now), and if this is why I will want to change my
training behavior.

thanks,
micah


Re: Bayes learning trusted networks mailing list email

Posted by RW <rw...@googlemail.com>.
On Fri, 05 Jun 2009 10:24:31 -0400
Micah Anderson <mi...@riseup.net> wrote:

 If I understand things properly, because I've got these
> setup in my trusted_networks, then these previous hops will be
> checked in RBLs, so the spam is more detectable.

That doesn't really help. If you think about it, tests that run on
untrusted headers will run whether or not you put the list servers into
your trusted network. The tests that run on the trusted boundary are
whitelisting rules (plus a few rules that will soon get moved to the
internal boundary). You might get some benefit from putting the list
servers into the internal network, but the chances are that the list is
already blocking on zen, and maybe DUL lists and SPF.

> What I am unsure of is if I am poisoning my bayes by reporting these
> messages that make it through as spam. Should I be just deleting them?
> The tokens that are legitimate that will end up as collateral damage
> are going to be the list footers, the list administration messages,
> and potentially other pieces.
> 
> I'm hoping I can identify why my bayes database is so bad (it thinks
> everything is BAYES_00 now), and if this is why I will want to change
> my training behavior.

It's really hard for BAYES to work on in-list spams because they
contain so many strong ham tokens. What I would suggest is to use
a separate address and Bayes database for the lists and train it on all
spam, but only learn ham that doesn't hit BAYES_00. I use sieve to
select some in-list candidates for learning (with dspam rather than SA).

You might also configure BAYES to ignore some of the list headers.

Things like challenge-response messages and out-of-office replies are
best handled with simple filtering or custom SA tests.