You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joe Pranevich <jp...@lycos-inc.com> on 2007/09/02 21:28:55 UTC

Outbound spam filtering for a large ISP

Hello,

I maintain a large webmail host (I bet you can figure out which one) for
free/paid accounts that sends out tens of thousands of emails a day. We're
not quite Yahoo Mail or Hotmail, but we're pretty big. We're looking to scan
outbound mail using SpamAssassin and I'm hoping that someone here might have
some suggestions or feedback on what the best way to configure this would
be. I've seen a handful of posts about this in the archive, so I know it's
come up before. 

My plan is to scan all outbound mail and drop all mails that match to a log
file or a separate directory where they can be hand-reviewed by someone in
our customer service department. We also wouldn't want to actually modify
the mails on the way out-- so we wouldn't add the spamassassin mail headers.

Does anyone here have practical experience or advice, tweaks, etc. that
would help us to implement this sort of thing? (I know the volume will be
fairly high, but a nice farm of machines all running spamd should be able to
load balance that part fairly well. It's the rules I'm worried about and how
to make the log/discard work the way I want.)

Thanks in advance for any help you can provide.

Joe

-- 
View this message in context: http://www.nabble.com/Outbound-spam-filtering-for-a-large-ISP-tf4368897.html#a12452483
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Outbound spam filtering for a large ISP

Posted by SM <sm...@resistor.net>.
At 12:28 02-09-2007, Joe Pranevich wrote:
>I maintain a large webmail host (I bet you can figure out which one) for
>free/paid accounts that sends out tens of thousands of emails a day. We're
>not quite Yahoo Mail or Hotmail, but we're pretty big. We're looking to scan
>outbound mail using SpamAssassin and I'm hoping that someone here might have
>some suggestions or feedback on what the best way to configure this would
>be. I've seen a handful of posts about this in the archive, so I know it's
>come up before.

SpamAssassin can be called on the outgoing MTA.  As you doing 
webmail, it doesn't make sense to perform tests on the Received 
headers or try to detect compromised hosts using the usual 
DNSBLs.  You will surely have other mechanisms in place to handle 
that.  You may decide to have a different setup for free and for paid 
accounts as traffic may be somewhat different.

>My plan is to scan all outbound mail and drop all mails that match to a log
>file or a separate directory where they can be hand-reviewed by someone in
>our customer service department. We also wouldn't want to actually modify
>the mails on the way out-- so we wouldn't add the spamassassin mail headers.

That may be a lot of email to review.  You don't have to rewrite the 
headers.  If the message is detected as spam, it's better to stop it 
before it leaves your network.

>Does anyone here have practical experience or advice, tweaks, etc. that
>would help us to implement this sort of thing? (I know the volume will be
>fairly high, but a nice farm of machines all running spamd should be able to
>load balance that part fairly well. It's the rules I'm worried about and how
>to make the log/discard work the way I want.)

You'll need a DB backend that can handle the load.  Depending on your 
user-base, you may need to add/remove rules, especially if they use a 
non-English language.  You'll need DNSBL feeds for the network tests 
because of the volume of the queries.  Basically your setup would 
evaluate the message content and return the score.  From there you 
can log or take whatever action you deem appropriate.

Regards,
-sm 


Re: Outbound spam filtering for a large ISP

Posted by Ken A <ka...@pacific.net>.
Joe Pranevich wrote:
> Hello,
> 
> I maintain a large webmail host (I bet you can figure out which one) for
> free/paid accounts that sends out tens of thousands of emails a day. We're
> not quite Yahoo Mail or Hotmail, but we're pretty big. We're looking to scan
> outbound mail using SpamAssassin and I'm hoping that someone here might have
> some suggestions or feedback on what the best way to configure this would
> be. I've seen a handful of posts about this in the archive, so I know it's
> come up before. 
> 
> My plan is to scan all outbound mail and drop all mails that match to a log
> file or a separate directory where they can be hand-reviewed by someone in
> our customer service department. We also wouldn't want to actually modify
> the mails on the way out-- so we wouldn't add the spamassassin mail headers.
> 
> Does anyone here have practical experience or advice, tweaks, etc. that
> would help us to implement this sort of thing? (I know the volume will be
> fairly high, but a nice farm of machines all running spamd should be able to
> load balance that part fairly well. It's the rules I'm worried about and how
> to make the log/discard work the way I want.)
> 
> Thanks in advance for any help you can provide.
> 
> Joe
> 

For one more option, see http://mailscanner.info It's perl, works great 
with sendmail, and has a wide variety of options for queuing, 
quarantining, and classifying mail using SA and going beyond what SA 
does by itself. It's not a milter. It has a queue, check, then forward 
approach that nicely levels out the load on SA. There's also some nice 
addon reporting available in MailWatch (sourceforge).

-- 
Ken Anderson
Pacific.Net

Re: Outbound spam filtering for a large ISP

Posted by Loren Wilton <lw...@earthlink.net>.
> I'm using. But, maybe with some Bayesian work... it would be possible. 
> But,
> as I said, I'm a bit risk averse and Bayesian poisoning is so easy,
> especially at this volume.

I would turn off AWL I think in your situation.  But by and large Bayes 
poisioning is a myth, at least for the vast majority of people.  The junk 
spammers stick in seems better at identifying spam than poisioning it.  If 
you are concered about getting false passes from Bayes, just set the scores 
for bayes < 50 to zero, or to some smaller value than the default.  If you 
get bayes FNs, that is just more work for your customer service types, and 
will be self-correcting if they submit them as 'learn as ham'.

Hum.  In your case network checks aren't going to help very much, since you 
are doing the original sending.  So you are going to have to depend a lot 
more on rules than most people will.  You can/should use the clam 
integration to catch the vast majority of the phish messages and actual 
virui that spammers will probably try to send.  URIBL will probably also be 
very helpful, since it checks the target domains referenced in the email 
messages.

        Loren



Re: Outbound spam filtering for a large ISP

Posted by Joe Pranevich <jp...@lycos-inc.com>.

Loren Wilton wrote:
> 
> 
> My thought on a simple solution would be to feed the webmail into
> procmail, 
> have it run SA.  SA will do report_safe markup on spam.  You can now look
> at 
> the classification result in procmail and route the probable spam to a 
> special mailbox, otherwise let it pass through.  You will still need some 
> script or tool to re-send any message that was mis-classified, but this is 
> probably some fairly trivial web app cgi.
> 
> 

Unfortunately, I need to do this a rung down. There's a couple of ways to
send email (webmail for free, SMTP server access for paid, etc.) but all of
them forward/relay mail out a bank of relatively generic servers running
sendmail. So I was hoping to throw all of the SA integration there and catch
all of the cases.

But your suggestion of using procmail may still be valid here (but I'm not
immediately sure how/if procmail will work inside of a mail relay as opposed
to local delivery). I'll need to dig further. That's just "obvious" enough
that I didn't even consider it.

As for keeping the headers, I'm not completely opposed to that. But already
a sadly surprising amount of the outbound spam that is getting sent is being
scored at 4.9 (so I'll need to filter lower...). I suspect that would only
get worse if more mailers knew they could just tune their messages to what
I'm using. But, maybe with some Bayesian work... it would be possible. But,
as I said, I'm a bit risk averse and Bayesian poisoning is so easy,
especially at this volume.

Thanks for your feedback. I wasn't aware of the report_safe option, but
that's a good call too.

Joe


-- 
View this message in context: http://www.nabble.com/Outbound-spam-filtering-for-a-large-ISP-tf4368897.html#a12453881
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: Outbound spam filtering for a large ISP

Posted by Loren Wilton <lw...@earthlink.net>.
> My plan is to scan all outbound mail and drop all mails that match to a 
> log
> file or a separate directory where they can be hand-reviewed by someone in
> our customer service department. We also wouldn't want to actually modify
> the mails on the way out-- so we wouldn't add the spamassassin mail 
> headers.

Some things to think about:

1    You need to get mail from the webmail app to SA
2    You need to filter in SA without munging the orignal mail
3    You need to route potential spam to one or more mailboxes
4    You probably want to stick a 'spammyness' indication on the subject in 
the mailbox
5    You need to resend misclassified spam, ideally without an indication 
that it was resent
6    You probably want to run Bayes, and you need to train it on at least 
the misclassifications

There are various apps that will run SA on mail and then look at the result 
and use it for forwarding the mail, and toss the SA markup (at least for 
non-spam).  You might want to use one of these, but I'm not sure it is the 
best way to go.

Another method would be to filter with SA, and set report_safe to 1 so that 
if it decides the mail is spam, it wraps it as an attachment in another 
message and outputs that.  This has the advantage of not changing the 
original message, giving you a modified subject that includes the spammyness 
indication (if you want it), and wrappring viruses as attachments so that it 
is safe to open the outer message without problems.  SA can also unwrap 
these messages to recover the original message.

My thought on a simple solution would be to feed the webmail into procmail, 
have it run SA.  SA will do report_safe markup on spam.  You can now look at 
the classification result in procmail and route the probable spam to a 
special mailbox, otherwise let it pass through.  You will still need some 
script or tool to re-send any message that was mis-classified, but this is 
probably some fairly trivial web app cgi.

By default this will result in a SA score summary in the non-spam showing 
that it is non-spam.  You can probably play with the options on SA to make 
it not do this; but frankly I don't see a problem here.  If spammers can see 
that you are scanning outbound mail they might not bother abusing your 
service in the first place.

        Loren



Re: Outbound spam filtering for a large ISP

Posted by Kris Deugau <kd...@vianet.ca>.
Peter Mikeska (MiKi) wrote:
> here are my 2 cents ;)

I'm not Joe, but I have to disagree with some of your points.  <g>

> no word about MTA, from other answers its look like sendmail.
> for high volume and this kind of things there is something fast and
> relatively easy ;)

Umm... no.  Switching MTAs on a major mail cluster is NOT trivial or easy.

qmail also requires a fair amount of patching to behave "properly" AKA 
"according to generally accepted standards of behaviour".  This may 
admittedly be *easier* in a larger environment because you can build 
once and copy that to many systems.

> - use qmail as GW for outbond mails / dont know how webamail is using
> smtp, but can be setup to use qmail IP
> - on qmail use simscan with SA -  its in C and thus fast

Your MTA and MTA-content-scanner-glue are trivial loads compared to SA 
itself.  (IIRC Joe note that he would probably run spamd on remote hosts 
instead of SA integrated with the scanner (eg MIMEDefang), so that 
alters the structure a fair bit as well.)

> - in simscan you can easily set quarantine to directory, and best
> thing is that message in qrt. is untouched, clear message as come to
> smtp.

The same applies to any other properly-written MTA content scanner.  As 
noted by several other people, header tests won't be nearly as much use 
in determining spamminess as they might be on the recipient side of things.

Joe, one thing I'd suggest is to try to capture "legitimate" vs "spam" 
mail for a short time, and regenerate the entire set of SA scores from 
that data.  IMO it'll likely make your system much more accurate because 
the message body tests will end up being weighted much higher due to the 
  limited usefulness of most header tests.

-kgd



Re: Outbound spam filtering for a large ISP

Posted by "Peter Mikeska (MiKi)" <pe...@gmail.com>.
Hello Joe,

here are my 2 cents ;)

Sunday, September 2, 2007, 9:28:55 PM, you wrote:

> Hello,

> I maintain a large webmail host (I bet you can figure out which one) for
> free/paid accounts that sends out tens of thousands of emails a day. We're
> not quite Yahoo Mail or Hotmail, but we're pretty big. We're looking to scan
> outbound mail using SpamAssassin and I'm hoping that someone here might have
> some suggestions or feedback on what the best way to configure this would
> be. I've seen a handful of posts about this in the archive, so I know it's
> come up before. 

> My plan is to scan all outbound mail and drop all mails that match to a log
> file or a separate directory where they can be hand-reviewed by someone in
> our customer service department. We also wouldn't want to actually modify
> the mails on the way out-- so we wouldn't add the spamassassin mail headers.

no word about MTA, from other answers its look like sendmail.
for high volume and this kind of things there is something fast and
relatively easy ;)
- use qmail as GW for outbond mails / dont know how webamail is using
smtp, but can be setup to use qmail IP
- on qmail use simscan with SA -  its in C and thus fast
- in simscan you can easily set quarantine to directory, and best
thing is that message in qrt. is untouched, clear message as come to
smtp.
- postprocessing is easy after that
- SA of course must be customized for envir. but that is another story

> Does anyone here have practical experience or advice, tweaks, etc. that
> would help us to implement this sort of thing? (I know the volume will be
> fairly high, but a nice farm of machines all running spamd should be able to
> load balance that part fairly well. It's the rules I'm worried about and how
> to make the log/discard work the way I want.)

> Thanks in advance for any help you can provide.

if interested in help, contact me off list
Miki

> Joe




-- 
Best regards,
 Peter                            mailto:peter.mikeska@gmail.com


Re: Outbound spam filtering for a large ISP

Posted by Kelson <ke...@speed.net>.
Leon Kolchinsky wrote:
> Try amavisd-new list.
> There you could integrate your SA checks in a very efficient way (policy banks, quarantining, releasing etc.)
> MySQL backend is also a good idea on high load severs.

I'd also recommend MIMEDefang for integrating SpamAssassin into 
sendmail.  It's a milter, like amavisd-new.

We've been using it for several years on our servers.  It's very 
customizable -- basically if you can write something in Perl, you can do 
it in MD.

The authors also have a commercial product based on MIMEDefang, Can-It, 
which might be worth looking into.

MIMEDefang - http://www.mimedefang.org/
CanIt -      http://www.roaringpenguin.com/products/antiSpam

-- 
Kelson Vibber
SpeedGate Communications <www.speed.net>

RE: Outbound spam filtering for a large ISP

Posted by Leon Kolchinsky <lk...@univ.haifa.ac.il>.
> Hello,
> 
> I maintain a large webmail host (I bet you can figure out which one) for
> free/paid accounts that sends out tens of thousands of emails a day. We're
> not quite Yahoo Mail or Hotmail, but we're pretty big. We're looking to
> scan
> outbound mail using SpamAssassin and I'm hoping that someone here might
> have
> some suggestions or feedback on what the best way to configure this would
> be. I've seen a handful of posts about this in the archive, so I know it's
> come up before.
> 
> My plan is to scan all outbound mail and drop all mails that match to a
> log
> file or a separate directory where they can be hand-reviewed by someone in
> our customer service department. We also wouldn't want to actually modify
> the mails on the way out-- so we wouldn't add the spamassassin mail
> headers.
> 
> Does anyone here have practical experience or advice, tweaks, etc. that
> would help us to implement this sort of thing? (I know the volume will be
> fairly high, but a nice farm of machines all running spamd should be able
> to
> load balance that part fairly well. It's the rules I'm worried about and
> how
> to make the log/discard work the way I want.)
> 
> Thanks in advance for any help you can provide.
> 
> Joe
> 


Try amavisd-new list.
There you could integrate your SA checks in a very efficient way (policy banks, quarantining, releasing etc.)
MySQL backend is also a good idea on high load severs.



Regards,
Leon Kolchinsky