You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Thomas Bolioli <tp...@terranovum.com> on 2004/02/16 16:08:38 UTC

selling a client on SA

I have a client who is vacillating between SA and some commercial 
products. Most of the commercial products are not as accurate as SA but 
my client has some valid points. I have limited experience setting up SA 
(always plain vanilla installs through RH and single user systems) and I 
wanted to get some feedback on what is possible/feasible from the list. 
It would be used in a multi mail server env where all external mail 
would be routed through it for checks and modified headers, subjects 
would tag spam.
The issues are:
1) Does SA auto upgrade to newer versions (of the spam defs more 
importantly) or could it be set up to do so through an automated CPAN 
install?
2) Can individual users easily have their own training files (I know it 
is possible, looking more for feasibility) when they do not have an 
account on the system. ie; their training file could be located via the 
addressed email. If so,  is there anything out there to manage that? If 
not would it be possible to write a mail handler to parse the mail 
addressed to a spam@host mail box that users could forward spam to? This 
script could then use the from line to determine which user to train on.
3) What other issues will I run into in a multi user environment where 
the SA server is simply forwarding on mail between MTAs and not the 
final delivery agent?

Thanks in advance,
Tom

Re: selling a client on SA

Posted by Lucas Albers <ad...@cs.montana.edu>.

Gregory Sloop, Sloop Network & Computer Consulting said:

> I assume you mean that it's a pain to submit SPAM and HAM messages to feed
> bayes...
<how to submit ham/spam snipped>
I have SA running in a relay situation, and it's only handled 50K messages
and I have yet to train it either with spam or ham.
I just set my scores high enough, and only get 70% of the incoming spam,
but I don't have false postitives, or statistically insignifigant false
positives.
I'd have to manually examine the score the determine my actual block rate,
this is a guestimate based on the number of mail at each spam score.

Really it all comes down to how much spam you are willing to block and how
many false postives you want.
It's fascinating that while the volume of spam varies day to day the
average spam score only varies by less than .25 points in any day.

-- 
Luke Computer Science System Administrator
Security Administrator,College of Engineering
Montana State University-Bozeman,Montana

RE: selling a client on SA

Posted by Todd Schuldt <ts...@ised.org>.

Hrrm - are you sure it's learning properly with the emails as attachments
(and with multiple attachments)?

I know that when you open the attachment you can see the properties but when
it's stored won't bayes misunderstand since the spam is encapsulated within
a non-spam message (or are you manually extracting the attachments for
placement within imap)?  Using a quick combinetic of O=original and
#=attachment number and the original message is 0+1+2+3, is bayes under the
learn token process going to drop 0, and learn 1, 2 and 3 separately or is
it going to learn 0+1+2+3 and make additional tokens based on improper
correlations between attachments 1, 2 and 3?

If bayes is learning just the attachment portions and not the original
message, how much of an support issue is it to ensure that these attachments
are in a bayes readable format (ie: not in winmail.dat) from your clients?

This seems to me to be a whole lot of effort beyond a global imap
connection.

Todd

-----Original Message-----
From: Gregory Sloop, Sloop Network & Computer Consulting
[mailto:lsgregs@sloop.net] 
Sent: Monday, February 16, 2004 10:27 AM
To: jdow; spamassassin-users@incubator.apache.org
Subject: Re: selling a client on SA

>
> And in an environment with (we're stuck with it) Windows Outlook Express
> for the mail readers and Linux for the spam filters generating the spam
> and ham databases is a royal pita. (I know *I* am not about to run
> Outlook. And converting to another mail tool at this point is somewhat
> er "awkward" to say nothing about painful.)
>

I assume you mean that it's a pain to submit SPAM and HAM messages to feed
bayes...

It isn't as big a PITA as you might think. (I added this to the Wiki
recently, so perhaps you've not seen it.) It is more work than "auto"
processing - but for small sites, I much prefer to hand check submissions.
We are using a shared bayes DB for say 50 or less users.)

You CAN get messages from users unmodified pretty easily. You don't have to
use IMAP shared folders either.

First I setup two mail drop boxes - call them spam@domain.com
ham@domain.com.

Then I simply have users do this in Outlook/Outlook Express

Open a new mail message.
Address new message to the correct drop box. (spam@domain.com or
ham@domain.com)
Drag messages that apply, Ham or Spam into the new message - they'll be sent
as attachments.
(Make sure users don't do both ham and spam together in the same message.
That will make life a pain!)
Then I can pick up the messages myself with IMAP and review them for real
hammy/spammy-ness - and drag them into another IMAP folder for processing.
Then use sa-learn to teach bayes on the IMAP folder using the --mbox option.

(Plus, I can do all of this remotely. I'm a consultant, and the more I can
do from off site, the better!)
(I'm a newbie too, so perhaps I'm misunderstanding things - but this is a
pretty decent way to do things.)

Greg

----- Original Message -----
From: "jdow" <jd...@earthlink.net>
To: <sp...@incubator.apache.org>
Sent: Monday, February 16, 2004 7:26 AM
Subject: Re: selling a client on SA


> From: "Thomas Bolioli" <tp...@terranovum.com>
>
> > I have a client who is vacillating between SA and some commercial
> > products. Most of the commercial products are not as accurate as SA but
> > my client has some valid points. I have limited experience setting up SA
> > (always plain vanilla installs through RH and single user systems) and I
> > wanted to get some feedback on what is possible/feasible from the list.
> > It would be used in a multi mail server env where all external mail
> > would be routed through it for checks and modified headers, subjects
> > would tag spam.
> > The issues are:
> > 1) Does SA auto upgrade to newer versions (of the spam defs more
> > importantly) or could it be set up to do so through an automated CPAN
> > install?
> > 2) Can individual users easily have their own training files (I know it
> > is possible, looking more for feasibility) when they do not have an
> > account on the system. ie; their training file could be located via the
> > addressed email. If so,  is there anything out there to manage that? If
> > not would it be possible to write a mail handler to parse the mail
> > addressed to a spam@host mail box that users could forward spam to? This
> > script could then use the from line to determine which user to train on.
> > 3) What other issues will I run into in a multi user environment where
> > the SA server is simply forwarding on mail between MTAs and not the
> > final delivery agent?
> >
> > Thanks in advance,
> > Tom
>
> Setup is not all that hard. It takes time to train the filters. My
> experience indicates this is required. Generally each user must train
> their own Baysian filter or else a generic training must be setup by
> the administrator. That involves looking at spam and ham both as they
> come through. *I* would not do that since it involves looking at
> another person's mail.

> Note that I am still using SpamAssassin in preference over other
> potential tools. It works. I have personal control over it. But then,
> I am a programmer by trade these days. I'd not try to install it for
> my brother under these conditions, for example. (He's a rather er
> unimaginative counter of Ford automobile dealership beans.)
>
> {^_^}

Re: selling a client on SA

Posted by "Gregory Sloop, Sloop Network & Computer Consulting" <ls...@sloop.net>.

>
> And in an environment with (we're stuck with it) Windows Outlook Express
> for the mail readers and Linux for the spam filters generating the spam
> and ham databases is a royal pita. (I know *I* am not about to run
> Outlook. And converting to another mail tool at this point is somewhat
> er "awkward" to say nothing about painful.)
>

I assume you mean that it's a pain to submit SPAM and HAM messages to feed
bayes...

It isn't as big a PITA as you might think. (I added this to the Wiki
recently, so perhaps you've not seen it.) It is more work than "auto"
processing - but for small sites, I much prefer to hand check submissions.
We are using a shared bayes DB for say 50 or less users.)

You CAN get messages from users unmodified pretty easily. You don't have to
use IMAP shared folders either.

First I setup two mail drop boxes - call them spam@domain.com
ham@domain.com.

Then I simply have users do this in Outlook/Outlook Express

Open a new mail message.
Address new message to the correct drop box. (spam@domain.com or
ham@domain.com)
Drag messages that apply, Ham or Spam into the new message - they'll be sent
as attachments.
(Make sure users don't do both ham and spam together in the same message.
That will make life a pain!)
Then I can pick up the messages myself with IMAP and review them for real
hammy/spammy-ness - and drag them into another IMAP folder for processing.
Then use sa-learn to teach bayes on the IMAP folder using the --mbox option.

(Plus, I can do all of this remotely. I'm a consultant, and the more I can
do from off site, the better!)
(I'm a newbie too, so perhaps I'm misunderstanding things - but this is a
pretty decent way to do things.)

Greg

----- Original Message -----
From: "jdow" <jd...@earthlink.net>
To: <sp...@incubator.apache.org>
Sent: Monday, February 16, 2004 7:26 AM
Subject: Re: selling a client on SA


> From: "Thomas Bolioli" <tp...@terranovum.com>
>
> > I have a client who is vacillating between SA and some commercial
> > products. Most of the commercial products are not as accurate as SA but
> > my client has some valid points. I have limited experience setting up SA
> > (always plain vanilla installs through RH and single user systems) and I
> > wanted to get some feedback on what is possible/feasible from the list.
> > It would be used in a multi mail server env where all external mail
> > would be routed through it for checks and modified headers, subjects
> > would tag spam.
> > The issues are:
> > 1) Does SA auto upgrade to newer versions (of the spam defs more
> > importantly) or could it be set up to do so through an automated CPAN
> > install?
> > 2) Can individual users easily have their own training files (I know it
> > is possible, looking more for feasibility) when they do not have an
> > account on the system. ie; their training file could be located via the
> > addressed email. If so,  is there anything out there to manage that? If
> > not would it be possible to write a mail handler to parse the mail
> > addressed to a spam@host mail box that users could forward spam to? This
> > script could then use the from line to determine which user to train on.
> > 3) What other issues will I run into in a multi user environment where
> > the SA server is simply forwarding on mail between MTAs and not the
> > final delivery agent?
> >
> > Thanks in advance,
> > Tom
>
> Setup is not all that hard. It takes time to train the filters. My
> experience indicates this is required. Generally each user must train
> their own Baysian filter or else a generic training must be setup by
> the administrator. That involves looking at spam and ham both as they
> come through. *I* would not do that since it involves looking at
> another person's mail.

> Note that I am still using SpamAssassin in preference over other
> potential tools. It works. I have personal control over it. But then,
> I am a programmer by trade these days. I'd not try to install it for
> my brother under these conditions, for example. (He's a rather er
> unimaginative counter of Ford automobile dealership beans.)
>
> {^_^}

Re: selling a client on SA

Posted by jdow <jd...@earthlink.net>.

From: "Thomas Bolioli" <tp...@terranovum.com>

> I have a client who is vacillating between SA and some commercial 
> products. Most of the commercial products are not as accurate as SA but 
> my client has some valid points. I have limited experience setting up SA 
> (always plain vanilla installs through RH and single user systems) and I 
> wanted to get some feedback on what is possible/feasible from the list. 
> It would be used in a multi mail server env where all external mail 
> would be routed through it for checks and modified headers, subjects 
> would tag spam.
> The issues are:
> 1) Does SA auto upgrade to newer versions (of the spam defs more 
> importantly) or could it be set up to do so through an automated CPAN 
> install?
> 2) Can individual users easily have their own training files (I know it 
> is possible, looking more for feasibility) when they do not have an 
> account on the system. ie; their training file could be located via the 
> addressed email. If so,  is there anything out there to manage that? If 
> not would it be possible to write a mail handler to parse the mail 
> addressed to a spam@host mail box that users could forward spam to? This 
> script could then use the from line to determine which user to train on.
> 3) What other issues will I run into in a multi user environment where 
> the SA server is simply forwarding on mail between MTAs and not the 
> final delivery agent?
> 
> Thanks in advance,
> Tom

Setup is not all that hard. It takes time to train the filters. My
experience indicates this is required. Generally each user must train
their own Baysian filter or else a generic training must be setup by
the administrator. That involves looking at spam and ham both as they
come through. *I* would not do that since it involves looking at
another person's mail.

And in an environment with (we're stuck with it) Windows Outlook Express
for the mail readers and Linux for the spam filters generating the spam
and ham databases is a royal pita. (I know *I* am not about to run
Outlook. And converting to another mail tool at this point is somewhat
er "awkward" to say nothing about painful.)

Note that I am still using SpamAssassin in preference over other
potential tools. It works. I have personal control over it. But then,
I am a programmer by trade these days. I'd not try to install it for
my brother under these conditions, for example. (He's a rather er
unimaginative counter of Ford automobile dealership beans.)

{^_^}

Re: selling a client on SA

Posted by Tim Stoop <ti...@cidev.nl>.

Op maandag 16 februari 2004 16:08, schreef Thomas Bolioli:
> 2) Can individual users easily have their own training files (I know it
> is possible, looking more for feasibility) when they do not have an
> account on the system. ie; their training file could be located via the
> addressed email. If so,  is there anything out there to manage that? If
> not would it be possible to write a mail handler to parse the mail
> addressed to a spam@host mail box that users could forward spam to? This
> script could then use the from line to determine which user to train on.

Actually, I'm deploying SpamAssassin as we speak in just an environment like 
you describe. I'm using PostgreSQL as central information server. For the 
record, I'm trying to automate a server that provides Virtual Servers for 
clients. They get an all-in package at my place, which means unlimited 
email-account, unlimited forwarders, SpamAssassin, virusscan, etc.

I'm using PostgreSQL as a central information server. Postfix (my preferred 
MTA) get's most of it's configuration from PostgreSQL, Courier-IMAP and -POP 
get it's information from the same PostgreSQL catalog. This week, I'm going 
to install SA that's in Subversion, because it sports Bayes and AWL totally 
DB-based. Which means everything will be done in the database, users only 
need diskspace for their maildir (which could actually be in the DB too, but 
we've chosen a different path for that).

It should be possible with Michael's neat DB-patches for Bayes and AWL in a 
DB. I'm going to try this week if I can get it to work and I will hopefully 
write a HowTo on how to do this in the coming six weeks. If I can get it to 
work.

The largest difficulty I see at this time, is how I can get the procmail in 
order, so people can teach their own filters and stuff. I'll find a way, 
though.

-- 
Met vriendelijke groet,
Tim Stoop
Complete Internet Development
http://www.cidev.nl

Random quote/fortune:
You're dead, Jim. -- McCoy, "Amok Time", stardate 3372.7

Re: selling a client on SA

Posted by Paolo Cravero as2594 <pc...@as2594.net>.

Thomas Bolioli wrote:

Hi Thomas

> 1) Does SA auto upgrade to newer versions (of the spam defs more 
> importantly) or could it be set up to do so through an automated CPAN 
> install?

No, and you might not like that. Auto-update sounds too commercial to 
me, and takes off from you the privilege of knowing what goes on during 
an update (which includes what and why something went wrong).

Look, when I moved from SA 2.55 to 2.61 I had to rebuild the Bayes 
database, which required to stop spamd daemon. 2.7 might work only on 
Perl 5.8+, ... you don't want auto-update to update Perl as well, don't 
you?! :-)

> 2) Can individual users easily have their own training files (I know it 
> is possible, looking more for feasibility) when they do not have an 

Look, here we are routing about 2GB of Internet incoming mail every day 
(40000+ mails per day), towards about 5-6000 mailboxes spread over a 
dozen of domains. None of the individuals ever asked for rules 
customization. I think it's a matter of how you "sell" the antispam 
solution (guarantee 90% hit ratio, so they'll tolerate what passes 
through, then produce statistics).

> 3) What other issues will I run into in a multi user environment where 
> the SA server is simply forwarding on mail between MTAs and not the 
> final delivery agent?

Bounces! One of the domains here receives only spam (90% of the whole 
traffic). These do bounce back to inexistant addresses, usually.

High-Availability is another issue.

Rather than marking subjects you might consider quarantining spam on the 
transit MTA. This reduces bounces too!

Paolo