You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Mark Williams <ma...@gmail.com> on 2005/07/19 09:32:11 UTC

Early Questions

Hi All (just joined, so please be gentle;-) ),

I have just installed spamassassin v3.0.4 in a test environment (which
is a mirror of the live environment) and have a number of questions,
which I can not see within the manuals/support documentation.

Firstly, this is my configuration:

Server: Linux (RH9.0), with spamassassin installed from
spamassassin.org web site using "make" etc.... (not RPM's). This
machine then runs both IMAP and POP3 for clients. MTA is sendmail

Client(s): Windows XP. All running Windows XP and MS Outlook 2000. All
users connct to POP3 Server (on Linux machine) and use PST files to
download their e-mail(s).

General: Setup is such that spamassassin is site wide (not per user) -
as per management request. All working fine at the moment - just about
to "switch on bayes"

Questions:

(q1) Given that this is a site-wide installation, how do I get the
requisite 200 e-mails (spam/ham) for spamassassin to work with? Where
should I put these (an individual mailbox)?


Thanks

Mark

Re: Early Questions

Posted by Matt Kettler <mk...@evi-inc.com>.

Mark Williams wrote:
> Hi All (just joined, so please be gentle;-) ),
> 
> I have just installed spamassassin v3.0.4 in a test environment (which
> is a mirror of the live environment) and have a number of questions,
> which I can not see within the manuals/support documentation.
> 
> Firstly, this is my configuration:
> 
> Server: Linux (RH9.0), with spamassassin installed from
> spamassassin.org web site using "make" etc.... (not RPM's). This
> machine then runs both IMAP and POP3 for clients. MTA is sendmail
> 
> Client(s): Windows XP. All running Windows XP and MS Outlook 2000. All
> users connct to POP3 Server (on Linux machine) and use PST files to
> download their e-mail(s).
> 
> General: Setup is such that spamassassin is site wide (not per user) -
> as per management request. All working fine at the moment - just about
> to "switch on bayes"
> 
> Questions:
> 
> (q1) Given that this is a site-wide installation, how do I get the
> requisite 200 e-mails (spam/ham) for spamassassin to work with? Where
> should I put these (an individual mailbox)?

Doesn't matter where you put them, what you need to do is feed them to sa-learn 
--ham and sa-learn --spam. After sa-learn has examined them and added tokens to 
it's bayes DB, the emails are no longer needed.

You'll need to do your sa-learn runs as the same user your mail scanning gets 
executed as. Since you're using sendmail this will likely be root.

However, if you use spamd, it will be very averse to scanning mail while running 
as root, and will setuid itself to "nobody" to prevent security holes. The home 
directory for "nobody" isn't writable by nobody, so SA won't use bayes while 
this is going on. (And don't fix it by giving nobody a home dir that it can 
write to! Many processes use nobody and expect it to be homeless. Giving it a 
homedir can weaken your system's security in the event an exploit occurs.)

What you'll want to do in this case is create a separate "spamd" user, and add 
"-u spamd" to your spamd start up. Then when you want to learn mail, su yourself 
to spamd.

Re: Early Questions

Posted by Kai Schaetzl <ma...@conactive.com>.

Mark Williams wrote on Tue, 19 Jul 2005 08:32:11 +0100:

> (q1) Given that this is a site-wide installation, how do I get the 
> requisite 200 e-mails (spam/ham) for spamassassin to work with?

Collect them from the mailboxes you are allowed to check. Maybe just your 
own.

 Where 
> should I put these (an individual mailbox)?

You actively learn the messages via sa-learn to Bayes.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org

Re: Early Questions

Posted by DAve <da...@pixelhammer.com>.

Jay Lee wrote:
> Mark Williams wrote:
> 
>> I have just installed spamassassin v3.0.4 in a test environment (which
>> is a mirror of the live environment) and have a number of questions,
>> which I can not see within the manuals/support documentation.
>>
>> Firstly, this is my configuration:
>>
>> Server: Linux (RH9.0), with spamassassin installed from
>> spamassassin.org web site using "make" etc.... (not RPM's). This
>> machine then runs both IMAP and POP3 for clients. MTA is sendmail
  >>
> Surely your not going live with a distribution as old and unsupported as 
> RedHat 9!  Do you want to become a spam zombie?  I urge you strongly to 
> look at moving up to RedHat Enterprise Linux 4, CentOS 4 or a recent 
> Fedora release.  Also, you really should stick with the RPMS, it makes 
> management and future upgrades much smoother.
> 

Please don't say things like that. RedHat 9 can be perfectly secure and 
reliable. I have seen new installs of RedHat turned into IRC Bots 
overnight by virue of their poor use of RPMs.

I would have good faith in a server running a three year old kernel, 
locked down by an Admin who built his own sources, and knew his s*&t 
inside and out. I would have no faith in a server that was running the 
"latest" distro/RPM/package just because it was "the latest".

If you rely on the version number of your distro and the build skills of 
an unknown party to be the extent of your security awareness, you are 
certain to end up on someones RBL.

DAve

Re: Early Questions

Posted by Loren Wilton <lw...@earthlink.net>.

> >> Use bayes autolearning so that you don't have to bother to much.
> >> Also setup some aliases like ham@you.com and spam@you.com where users
> >> can forward wrongly classified mail for you to reclassify.  Don't try
> >> to use someone else's bayes db and don't use just your personal email
> >> since it won't match the bayes characteristics of the entire company.
> >
> >
> The problem of course, with this solution is that you cannot simply
> forward messages to a mailbox and train spamassassin.  If you do, you

This is a good point, *forwarding* the message to the public folder in
Outlook or OE will screw it up badly and make learning it
counter-productive.

However, *copying* or *moving* the message to the public folder in
Outlook/OE, while it may somewhat mangle the headers (depending on
Outlook/Exchange version) will usually do a good enough job of recreating
the original spam with more or less the original headers.

Users should be told that they have to move/copy/drag the message to the
public folder, NOT forward it.

        Loren

Re: Early Questions

Posted by "Matthew D. Sill" <ms...@eeinternet.com>.

>>>
>> Use bayes autolearning so that you don't have to bother to much.  
>> Also setup some aliases like ham@you.com and spam@you.com where users 
>> can forward wrongly classified mail for you to reclassify.  Don't try 
>> to use someone else's bayes db and don't use just your personal email 
>> since it won't match the bayes characteristics of the entire company.
>
>
The problem of course, with this solution is that you cannot simply 
forward messages to a mailbox and train spamassassin.  If you do, you 
will be training spamassassin that the user/client who sent the message 
is spammy/hammy.  This is almost never what you want. 
Perhaps what Gary V says is even better.  ( I posted this same question 
to the amavis users list a year ago )

A message that is forwarded from a user is not the same message that
was originally received (because it has a new set of headers). If
you are using 'sa-learn --spam' then you *are* training bayes to
recognize stuff that comes from you as spam. You need to save the
original message in its entirety with the original headers intact.
Your MUA may have the ability to export the message as .EML or .TXT.
Make sure the file that results from the export is plain text.
If you are on a Windows machine, you could export the messages
to a folder, then copy the folder over to your Debian box using
WinSCP or other means.

Cheers,
-Matt



-- 
-----------------------------------------------------|
Matthew D. Sill : Developer/Analyst
ph# (907) 456-5581 : fax# (907) 456-3111
Engineering & Environmental Internet Solutions, LLC
530 7th Ave. Suite #1 : Fairbanks, AK 99701
-----------------------------------------------------|

Re: Early Questions

Posted by Jim Maul <jm...@elih.org>.

Jay Lee wrote:
> Mark Williams wrote:
> 
>> I have just installed spamassassin v3.0.4 in a test environment (which
>> is a mirror of the live environment) and have a number of questions,
>> which I can not see within the manuals/support documentation.
>>
>> Firstly, this is my configuration:
>>
>> Server: Linux (RH9.0), with spamassassin installed from
>> spamassassin.org web site using "make" etc.... (not RPM's). This
>> machine then runs both IMAP and POP3 for clients. MTA is sendmail
>>
>>  
>>
> Surely your not going live with a distribution as old and unsupported as 
> RedHat 9!  Do you want to become a spam zombie?  I urge you strongly to 
> look at moving up to RedHat Enterprise Linux 4, CentOS 4 or a recent 
> Fedora release.  Also, you really should stick with the RPMS, it makes 
> management and future upgrades much smoother.
> 
>> Client(s): Windows XP. All running Windows XP and MS Outlook 2000. All
>> users connct to POP3 Server (on Linux machine) and use PST files to
>> download their e-mail(s).
>>
>> General: Setup is such that spamassassin is site wide (not per user) -
>> as per management request. All working fine at the moment - just about
>> to "switch on bayes"
>>
>> Questions:
>>
>> (q1) Given that this is a site-wide installation, how do I get the
>> requisite 200 e-mails (spam/ham) for spamassassin to work with? Where
>> should I put these (an individual mailbox)?
>>
> Use bayes autolearning so that you don't have to bother to much.  Also 
> setup some aliases like ham@you.com and spam@you.com where users can 
> forward wrongly classified mail for you to reclassify.  Don't try to use 
> someone else's bayes db and don't use just your personal email since it 
> won't match the bayes characteristics of the entire company.  Note that 
> you can also modify the number of spam and ham messages the bayes db 
> needs before it starts scoring with these two rules in local.cf:
> 
> bayes_min_ham_num 100
> bayes_min_spam_num 50
> 
> be careful about setting it to low though, the less bayes knows about 
> your org's email characteristics the more likely false positives are.
> 
> Jay
> 
> 
I just wanted to add something to this quick.  You may also want to 
(perhaps even *should*) alter the autolearn thresholds if you are going 
to use bayes autolearning.  The default values have been seen to 
autolearn in the wrong direction sometimes.  I have changed mine to:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0

Also, note that while users can forward email to the spam@ and ham@ 
addresses referred to above, they must do so as an attachment so the 
original email is untouched.  Regular forwarding will add/alter headers 
which will cause bayes nightmares.

And finally as a general note..a lot of people seem to not use bayes for 
one reason or another..and tend to have autolearning disabled.  However 
with the correct settings and some careful monitoring (at least in the 
beginning) bayes w/autolearn can work wonders.

BTW, im also running RH9 AND SA 2.64, but this was installed almost 2 
years ago ;)  Still running great though.

-Jim

Re: [ot] OS comments (was Early Questions)

Posted by Duncan Hill <sa...@nacnud.force9.co.uk>.

On Tuesday 19 July 2005 18:50, Jay Lee wrote:
> Mark Williams wrote:
> >Server: Linux (RH9.0), with spamassassin installed from
> >spamassassin.org web site using "make" etc.... (not RPM's). This
> >machine then runs both IMAP and POP3 for clients. MTA is sendmail
>
> Surely your not going live with a distribution as old and unsupported as
> RedHat 9!  Do you want to become a spam zombie?  I urge you strongly to

While I understand the sentiment, the wording is pure scare-mongering imo.  I 
know of several deployments of RH9 that are quite secure (due to only running 
2 externally reachable ports and up-to-date software) and have never become 
'spam zombies'.  The speed at which I've seen a Windows box get zombied, on 
the other hand, is mind-numbing :>

Considering there are much more up to date releases of the RedHat line and 
other OSs it would be a much better idea to use one of those.  When the RH9 
box has been deployed for 2 years and keeps on trucking, there's no real 
point to take down a 24/7 service to deal with an upgrade to a new version.

Re: Early Questions

Posted by Jay Lee <jl...@pbu.edu>.

Mark Williams wrote:

>I have just installed spamassassin v3.0.4 in a test environment (which
>is a mirror of the live environment) and have a number of questions,
>which I can not see within the manuals/support documentation.
>
>Firstly, this is my configuration:
>
>Server: Linux (RH9.0), with spamassassin installed from
>spamassassin.org web site using "make" etc.... (not RPM's). This
>machine then runs both IMAP and POP3 for clients. MTA is sendmail
>
>  
>
Surely your not going live with a distribution as old and unsupported as 
RedHat 9!  Do you want to become a spam zombie?  I urge you strongly to 
look at moving up to RedHat Enterprise Linux 4, CentOS 4 or a recent 
Fedora release.  Also, you really should stick with the RPMS, it makes 
management and future upgrades much smoother.

>Client(s): Windows XP. All running Windows XP and MS Outlook 2000. All
>users connct to POP3 Server (on Linux machine) and use PST files to
>download their e-mail(s).
>
>General: Setup is such that spamassassin is site wide (not per user) -
>as per management request. All working fine at the moment - just about
>to "switch on bayes"
>
>Questions:
>
>(q1) Given that this is a site-wide installation, how do I get the
>requisite 200 e-mails (spam/ham) for spamassassin to work with? Where
>should I put these (an individual mailbox)?
>
Use bayes autolearning so that you don't have to bother to much.  Also 
setup some aliases like ham@you.com and spam@you.com where users can 
forward wrongly classified mail for you to reclassify.  Don't try to use 
someone else's bayes db and don't use just your personal email since it 
won't match the bayes characteristics of the entire company.  Note that 
you can also modify the number of spam and ham messages the bayes db 
needs before it starts scoring with these two rules in local.cf:

bayes_min_ham_num 100
bayes_min_spam_num 50

be careful about setting it to low though, the less bayes knows about 
your org's email characteristics the more likely false positives are.

Jay