You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Kelson Vibber <KV...@tollfreeforwarding.com> on 2011/07/28 20:55:30 UTC

Bayes & Apache James server

I'm setting up the spam/virus filtering on an Apache James mail server, and SpamAssassin is one of the pieces we plan on using. I used to run a Sendmail-based server with SpamAssassin for years at a previous job, so I'm familiar with SA, but I'm still new to James.

James includes a plugin for Bayesian spam filtering. So far, the main advantage I see for it is that it includes a system to train the filter by forwarding attachments.

Does anyone here have experience with *both* James's Bayesian filter and SA's?

If so, would you recommend:

1.       Sticking with SA's Bayesian filter?

2.       Running SpamAssassin without Bayes, then James' BayesianAnalysis mailet?

3.       Running James's BayesianAnalysis mailet first, then SpamAssassin without Bayes?

In case it makes a difference, we're running James 2.3 with the SpamAssassin mailet backported from 3.0, and we'll be using a sitewide database (at least to begin with).

Thanks in advance,

Kelson Vibber
TollFreeForwarding.com, Development


RE: Bayes & Apache James server

Posted by Kelson Vibber <KV...@tollfreeforwarding.com>.
> -----Original Message-----
> From: David F. Skoll [mailto:dfs@roaringpenguin.com]
>
> It's probably more efficient to have the thing that would block more mail run
> first.  On our installation, for example, ClamAV stops less than 0.1% of all mail
> (yes, you read that right), so running it first is useless from a performance
> standpoint since SA would be invoked almost all the time anyway.

Depends on the requirements. In our case, we're blocking viruses but tagging spam for later, so it's slightly more efficient to do the virus scan first. Even if it blocks <1%, it's still greater than zero.

> We don't use the Sane Security signatures.  If using them would make Clam
> block (say) 10% or more of all messages, I'd have to re-evaluate my opinion.

I wish I could remember the stats from my old job. We had a system that started with IP block lists, then ClamAV with a bunch of the Sane Security  spam signatures, then SpamAssassin, all tied together with MIMEDefang. (Thank you, BTW - that piece of software gave me so much flexibility in our scanning!) I had MD sort out the virus hits vs. the spam hits from Clam and decide what got discarded, what got blocked, and what got sent along to SA. I seem to remember it being worth it, but I just can't remember the numbers.

Kelson Vibber
TollFreeForwarding.com, Development




Re: Bayes & Apache James server

Posted by "David F. Skoll" <df...@roaringpenguin.com>.
On Fri, 29 Jul 2011 15:08:34 -0400
Adam Moffett <ad...@plexicomm.net> wrote:

> I've often mused about which should run first, but never did any sort
> of testing.  Is it pretty much the general consensus that it's less 
> wasteful for the AV to scan the spam than to have SA scan the malware?

It's probably more efficient to have the thing that would block more
mail run first.  On our installation, for example, ClamAV stops less
than 0.1% of all mail (yes, you read that right), so running it first
is useless from a performance standpoint since SA would be invoked
almost all the time anyway.

We don't use the Sane Security signatures.  If using them would make
Clam block (say) 10% or more of all messages, I'd have to re-evaluate
my opinion.

Regards,

David.

Re: Bayes & Apache James server

Posted by Dave Funk <db...@engineering.uiowa.edu>.
On Fri, 29 Jul 2011, Adam Moffett wrote:

> On 07/29/2011 02:13 PM, Kelson Vibber wrote:
>>> >  Also, to complete the system, I recall there were some AV-mailets at 
>>> the age. If possible use>  them before SA to catch message carrying 
>>> viruses.
>> Absolutely - we've got ClamAV running first, before anything touches SA, 
>> and using some of the SaneSecurity signature sets to catch additional 
>> malware.
>
> I've often mused about which should run first, but never did any sort of 
> testing.  Is it pretty much the general consensus that it's less wasteful for 
> the AV to scan the spam than to have SA scan the malware?

Need to keep in mind that the AV scans and SA scans have somewhat
different criteria for what to scan and how to deal with the results
of the scan.

EG: don't SA scan mail with large binary attachments, Do want to AV
scan such critters.

users may want to receive spam tagged messages for personal judgment,
in general you do not want to give users AV detected malware.

Probably want to run AV scanning at the front end of the mail process
so you can SMTP reject malware, SA scanning could be deferred to later
in the processing chain.

As the AV scanning tends to be a less resource intensive process you
probably want to do that first.

FWIW, I run two different instances of ClamAV. One with the stock
signatures as a milter front-end that SMTP rejects viri, one with
SaneSecurity and other additional sigs via the SA clam plugin to aid
spam/phish detection.

The SaneSecurity sigs are good but have too high a FP rate for me
to feel comfortable running them as a SMTP reject process. I'm
quite happy to run them as a part of SA where Bayes, white-lists,
score adjustments, etc can ameliorate damage from FPs.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Bayes & Apache James server

Posted by Jason Bertoch <ja...@i6ix.com>.
On 7/29/2011 3:08 PM, Adam Moffett wrote:
> On 07/29/2011 02:13 PM, Kelson Vibber wrote:
>>> >  Also, to complete the system, I recall there were some AV-mailets 
>>> at the age. If possible use>  them before SA to catch message 
>>> carrying viruses.
>> Absolutely - we've got ClamAV running first, before anything touches 
>> SA, and using some of the SaneSecurity signature sets to catch 
>> additional malware.
> I've often mused about which should run first, but never did any sort 
> of testing.  Is it pretty much the general consensus that it's less 
> wasteful for the AV to scan the spam than to have SA scan the malware?
>
>

It depends on your setup and, more importantly, your ability to feed 
mail back into Bayes.  For my last setup, the filter sat in front of 
customer-hosted servers, which left no easy feed back into Bayes.  As a 
result, I had to use autolearn on a carefully maintained filter.  In my 
case, Bayes performed extraordinarily better when run prior to clam 
(with SaneSecurity) due to seeing the bad mail.  I'd done the opposite 
for some time before testing this, and needed to retrain the database 
more often than I cared to, because it thought everything was ham.  I 
never saw a performance hit on a 1 million/day server, but the Bayes 
accuracy was far better.

$.02

-- 
/Jason


Re: Bayes & Apache James server

Posted by Adam Moffett <ad...@plexicomm.net>.
On 07/29/2011 02:13 PM, Kelson Vibber wrote:
>> >  Also, to complete the system, I recall there were some AV-mailets at the age. If possible use>  them before SA to catch message carrying viruses.
> Absolutely - we've got ClamAV running first, before anything touches SA, and using some of the SaneSecurity signature sets to catch additional malware.
I've often mused about which should run first, but never did any sort of 
testing.  Is it pretty much the general consensus that it's less 
wasteful for the AV to scan the spam than to have SA scan the malware?



RE: Bayes & Apache James server

Posted by Kelson Vibber <KV...@tollfreeforwarding.com>.
> That said, I would suggest to not decouple bayes from SA, since I wouldn't see any advantage
> in this approach and you would rather miss the a bayes score from the SA totals. You would
> end having more FPs due to the bayesian mailer running apart and needing special score
> thresholds in SA.

That was my thinking as well.  Thanks for confirming that I'm on the right track.

> I would also suggest to avoid using amavisd and the like to run SA tests:
> that application supplies some message routing schemes which are really useful with
> "simple" mail exchangers, but that may complicate things a lot with a mailet-based design. I
> would suggest to use spamd instead.

Hmm, that's something I hadn't thought about. As it is, it's not a problem. James 3.0 includes a mailet that talks directly to spamd, and we backported it to the version we're running.

> Also, to complete the system, I recall there were some AV-mailets at the age. If possible use > them before SA to catch message carrying viruses.

Absolutely - we've got ClamAV running first, before anything touches SA, and using some of the SaneSecurity signature sets to catch additional malware.

Thanks!

Kelson Vibber
TollFreeForwarding.com, Development



RE: Bayes & Apache James server

Posted by Giampaolo Tomassoni <Gi...@Tomassoni.biz>.
> From: Kelson Vibber [mailto:KV@tollfreeforwarding.com] 
> 
> ...omissis...
> 
> If so, would you recommend:
> 1. Sticking with SA's Bayesian filter?
> 2. Running SpamAssassin without Bayes, then James' BayesianAnalysis
> mailet?
> 3. Running James's BayesianAnalysis mailet first, then SpamAssassin
> without Bayes?

This is interesting: I looked at James some years ago for a project of mine,
but it looked yet a bit immature for production to me. Things are surely
evolved then.

I don't have any direct experience with it, so I can only guess a good
interaction between it and SA.

That said, I would suggest to not decouple bayes from SA, since I wouldn't
see any advantage in this approach and you would rather miss the a bayes
score from the SA totals. You would end having more FPs due to the bayesian
mailer running apart and needing special score thresholds in SA.

I would also suggest to avoid using amavisd and the like to run SA tests:
that application supplies some message routing schemes which are really
useful with "simple" mail exchangers, but that may complicate things a lot
with a mailet-based design. I would suggest to use spamd instead.

Also, to complete the system, I recall there were some AV-mailets at the
age. If possible use them before SA to catch message carrying viruses.


> In case it makes a difference, we're running James 2.3 with
> the SpamAssassin mailet backported from 3.0, and we'll be
> using a sitewide database (at least to begin with).
> 
> Thanks in advance,
> 
> Kelson Vibber
> TollFreeForwarding.com, Development

Good luck,

Giampaolo