You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2006/10/24 19:42:58 UTC

RFC: spam trapping with policyd-weight and DNSBLs?

Hey --

just to turn the tables for a bit ;), I've recently been considering a
problem and a possible solution, and could do with SpamAssassin users'
advice.

These days, I've been forced to use SBL/XBL as an upfront anti-spam check,
rejecting spam at RCPT TO: time during the SMTP transaction. (Previously
I'd been running it from SpamAssassin in the usual manner.) That's great,
and it works well, rejecting a *lot* of spam and saving a lot of CPU time
by not running SpamAssassin. ;)

However: it's important for SpamAssassin developers and mass-checkers to
get a "representative" feed of spam -- with all kinds of spam included --
so that the rules are measured against something close to reality.  This,
unfortunately, implies that discarding mails that hit SBL/XBL is a bad
thing, since those mails won't get into the mass-checked corpora -- and
what will be mass-checked from that point on is just the 25% of spam that
evades those rules.

Bug 5096 suggests that we replace some of the mass-check corpora with
pure-spamtrap feeds to fix this.  Bit of a heavy fix :(

There's another way, though.  If it were possible to change the SMTP
transaction flowchart to include this:

  - is IP listed in SBL/XBL?
    - if not listed, deliver as normal;
    - else if listed, continue SMTP transaction as if normal delivery is
      underway, but deliver to a spamtrap mbox file or maildir.

Then we could avoid the delivery overhead of that spam -- the only
"delivery" is an append to a file or write to a maildir -- while still
recording the spam in question.

It appears Postfix would allow this using http://www.policyd-weight.org/
-- look up DNSBLs, give it a "weight", if weight is too high, add a
header.  Postfix rules can then intercept messages with that header and
divert to the spamtrap mbox.

Has anyone done this?  Got code you'd like to share?


(In the meantime, I'm just going back to removing the BL, using
SpamAssassin instead, and using the Shortcircuit plugin to reduce CPU load
if RCVD_IN_SBL or RCVD_IN_XBL fires.)

--j.

Re: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by Nigel Frankcom <ni...@blue-canoe.net>.

On Tue, 24 Oct 2006 18:42:58 +0100, jm@jmason.org (Justin Mason)
wrote:

>Hey --
>
>just to turn the tables for a bit ;), I've recently been considering a
>problem and a possible solution, and could do with SpamAssassin users'
>advice.
>
>These days, I've been forced to use SBL/XBL as an upfront anti-spam check,
>rejecting spam at RCPT TO: time during the SMTP transaction. (Previously
>I'd been running it from SpamAssassin in the usual manner.) That's great,
>and it works well, rejecting a *lot* of spam and saving a lot of CPU time
>by not running SpamAssassin. ;)
>
>However: it's important for SpamAssassin developers and mass-checkers to
>get a "representative" feed of spam -- with all kinds of spam included --
>so that the rules are measured against something close to reality.  This,
>unfortunately, implies that discarding mails that hit SBL/XBL is a bad
>thing, since those mails won't get into the mass-checked corpora -- and
>what will be mass-checked from that point on is just the 25% of spam that
>evades those rules.
>
>Bug 5096 suggests that we replace some of the mass-check corpora with
>pure-spamtrap feeds to fix this.  Bit of a heavy fix :(
>
>There's another way, though.  If it were possible to change the SMTP
>transaction flowchart to include this:
>
>  - is IP listed in SBL/XBL?
>    - if not listed, deliver as normal;
>    - else if listed, continue SMTP transaction as if normal delivery is
>      underway, but deliver to a spamtrap mbox file or maildir.
>
>Then we could avoid the delivery overhead of that spam -- the only
>"delivery" is an append to a file or write to a maildir -- while still
>recording the spam in question.
>
>It appears Postfix would allow this using http://www.policyd-weight.org/
>-- look up DNSBLs, give it a "weight", if weight is too high, add a
>header.  Postfix rules can then intercept messages with that header and
>divert to the spamtrap mbox.
>
>Has anyone done this?  Got code you'd like to share?
>
>
>(In the meantime, I'm just going back to removing the BL, using
>SpamAssassin instead, and using the Shortcircuit plugin to reduce CPU load
>if RCVD_IN_SBL or RCVD_IN_XBL fires.)
>
>--j.

I thought one of the big issues was to reduce the load on SA & mail
servers in general. Surely by using sbl/xbl and the like and dropping
mails at connect is a much more efficient method than running them
through the entire mail process. From my own experience the MTA led
RBL's are front line and stop much of the spam getting to the SA boxen
thus reducing overall strain on the systems here. 

I run 3 dedicated  backend SA boxen of a relatively high spec and I
don't see a huge amount of mail in terms of some list users. Adding
the extra strain of putting RBL'd mail through SA would seem to me to
be a pointless exercise?

A recent, back of a matchbook check, showed between RBLs, SA and
additional MTA anti spam systems we are killing 99.98% of spam; this
based on total spam caught against user reported spam.

For me that's a good number, why would I want to add in the extra
strain?

Kind regards

Nigel

Re: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.

Justin Mason wrote:

> (In the meantime, I'm just going back to removing the BL, using
> SpamAssassin instead, and using the Shortcircuit plugin to reduce CPU load
> if RCVD_IN_SBL or RCVD_IN_XBL fires.)

Can you selectively short-circuit based on the user's prefs (ie. spam 
traps vs not)?  Short-circuiting mail that you're going to masscheck 
won't be too compatible with --reuse...

Daryl

R: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by Giampaolo Tomassoni <g....@libero.it>.

> CAVEAT: just because the client is listed on sbl-xbl does not mean the 
> message is spam. In particular:
> - a legit user may be sending through a listed server.
> - a spammer may "corpus-corrupt" you by sending ham messages (slightly 
> modified copies from mailing lists)

I too like a lot SA score system: it want not allow a too radical point of view on spam to create troubles to my customers.

Justin, if you are concerned with CPU issues, why don't you adopt greylisting instead of shutting the door to possible ham? It gets the very same results in CPU relief, but it allows well-configured smtpd servers (possibly ligitimate ones) to submit their messages to SA for a more precise evaluation.

Giampaolo

Re: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by mouss <us...@free.fr>.

Justin Mason wrote:
> Hey --
>
> just to turn the tables for a bit ;), I've recently been considering a
> problem and a possible solution, and could do with SpamAssassin users'
> advice.
>
> These days, I've been forced to use SBL/XBL as an upfront anti-spam check,
> rejecting spam at RCPT TO: time during the SMTP transaction. (Previously
> I'd been running it from SpamAssassin in the usual manner.) That's great,
> and it works well, rejecting a *lot* of spam and saving a lot of CPU time
> by not running SpamAssassin. ;)
>
> However: it's important for SpamAssassin developers and mass-checkers to
> get a "representative" feed of spam -- with all kinds of spam included --
> so that the rules are measured against something close to reality.  This,
> unfortunately, implies that discarding mails that hit SBL/XBL is a bad
> thing, since those mails won't get into the mass-checked corpora -- and
> what will be mass-checked from that point on is just the 25% of spam that
> evades those rules.
>
> Bug 5096 suggests that we replace some of the mass-check corpora with
> pure-spamtrap feeds to fix this.  Bit of a heavy fix :(
>
> There's another way, though.  If it were possible to change the SMTP
> transaction flowchart to include this:
>
>   - is IP listed in SBL/XBL?
>     - if not listed, deliver as normal;
>     - else if listed, continue SMTP transaction as if normal delivery is
>       underway, but deliver to a spamtrap mbox file or maildir.
>   

CAVEAT: just because the client is listed on sbl-xbl does not mean the 
message is spam. In particular:
- a legit user may be sending through a listed server.
- a spammer may "corpus-corrupt" you by sending ham messages (slightly 
modified copies from mailing lists)

you can of course consider that the first is not a critical issue 
(statistically talking at least). but if spammers know what you're 
doing, the second point may become an issue (this is true with 
spamtraps, I don't know why spammers don't do it...).

Re: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by John Rudd <jr...@ucsc.edu>.

Jason Haar wrote:

> 
> Now if only it could deal with this storm of "VIiiagra"/"VIragra" spam
> that has been sneaking in... :-)
> 

The rules I posted in the "Scoring PTR's" thread seem to be doing a 
great job of catching those.

Re: RFC: spam trapping with policyd-weight and DNSBLs?

Posted by Jason Haar <Ja...@trimble.co.nz>.

Justin Mason wrote:
> However: it's important for SpamAssassin developers and mass-checkers to
> get a "representative" feed of spam -- with all kinds of spam included --
> so that the rules are measured against something close to reality.  
On a related note, we actually *stopped* using front-line RBLs as with
them in place, we were no longer able to get true stats as to the actual
flow of Spam/Ham into our sites. Which meant that we really couldn't
tell how effective our antispam systems were being. The "broad axe" that
is RBL meant that a single mail message coming from servers may be
blocked dozens of times (as it retries), meaning that our stats would
over-represent the effectiveness of front-line RBL methods. Now we just
let it all hit SpamAssassin, and have simply upped the score on those
RBLs we used to trust to reject directly, so that the Spam doesn't get
any further. End result: no delivery changes - but better quality stats.
Obviously you have to have over-speced your mail servers to be able to
do this - something poor old Justin can't manage I think :-)

(FYI: picking a random user of ours and looking at all Internet email
they received in Aug 2006 showed SA had >99% success rate at tagging
Spam. 85% was quarantined (scores >10/5) and the rest tagged for the
users to filter on. Also, ZERO ham misclassification - which is
something certain commercial competitors to SpamAssassin are actually
pretty bad at...)

Now if only it could deal with this storm of "VIiiagra"/"VIragra" spam
that has been sneaking in... :-)

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1