You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Max Clark <ma...@gmail.com> on 2006/04/17 22:20:24 UTC

Managing Spamassassin Data

Hi all,

After having spamd exit on me a couple of times (still no idea why), I
decided to put spamd under daemontools control (run file below). While
this has resulted in the stability I was looking for, I am now
presented with a number of growing log/spamassassin files - i.e.:

/service/spamd/razor-agent.log
/root/.spamassassin/auto-whitelist
/root/.spamassassin/bayes_journal
/root/.spamassassin/bayes_seen
/root/.spamassassin/bayes_toks

My question breaks into several parts;

1. The startup script from the FreeBSD port for spamd ran the service
as root - is there any reason not to switch spamd to the qpsmtpd
user/group?

2. Is there a way I can put the razor-agent.log into multilog? If not,
how do I rotate this log file?

3. My experiance with Bayes and AWL on amavisd-new is that these files
will only grow, what is the proper approach to pruning this data?

4. I am considering using an external Mysql database for my Bayes
database - how much should I expect this to trash my server, and what
is the proper approach to pruning this data?

Thanks in advance,
Max

#!/bin/sh
exec 2>&1 \
sh -c '
 exec \
   /usr/local/bin/spamd \
   -x \
   --socketpath=/var/run/spamd/spamd \
   -s stderr
'

--
 Max Clark
 http://www.clarksys.com

Re: Managing Spamassassin Data

Posted by Theo Van Dinter <fe...@apache.org>.
On Mon, Apr 17, 2006 at 05:44:57PM -0400, Kris Deugau wrote:
> >2. Is there a way I can put the razor-agent.log into multilog? If not,
> >how do I rotate this log file?
> 
> Set up a cron job to run 'find / -name "razor-agent.log" |xargs rm -f'.  <g>

Alternately, put the following in razor-agent.conf:

debuglevel             = 0

The log still gets created, but nothing ever goes into it.

-- 
Randomly Generated Tagline:
You tell 'em Cemetery, You are so grave.

Re: Managing Spamassassin Data

Posted by Kris Deugau <kd...@vianet.ca>.
Max Clark wrote:
> 2. Is there a way I can put the razor-agent.log into multilog? If not,
> how do I rotate this log file?

Set up a cron job to run 'find / -name "razor-agent.log" |xargs rm -f'.  <g>

I've found razor is a little indiscriminate about where it spews this 
"log" file;  I've found it in some truly bizarre places.  That may have 
something to do with the razor-agents version I'm running (don't even 
recall - it's probably a little old and outdated).

If at all possible, look into disabling it entirely if you can.  IIRC 
it's just a transcript of what razor does when it's called to scan a 
message.

-kgd

Re: Managing Spamassassin Data

Posted by "Gary D. Margiotta" <ga...@tbe.net>.
>
> 2. Is there a way I can put the razor-agent.log into multilog? If not,
> how do I rotate this log file?
>

For myself on FreeBSD, I installed by source, not by port, so adjust your 
configs as necessary, but I use the newsyslog facility (/etc/newsyslog) to 
rotate the log files with the nightly checks:

The maillog is rotated nightly:
/var/log/maillog                        640  120   *    @T00  JC

So, I added another entry for my spam log:
/var/log/spam.log                       640  120   *    @T00  JC

I've added several logfiles to the file to auto-rotate, such as named, and 
it works like a charm.

My relevant config bits:

How I start spamd:
/usr/local/bin/spamd --daemonize --username spamd --max-children=20 --min-spare=5 --pidfile /home/spamd/spamd.pid -s local5

(notice the "local5" part at the end, which defines the local5 syslog 
identifier)

The relevant syslog config:
local5.*                                        /var/log/spam.log

Hope this helps.

-Gary

Re: Managing Spamassassin Data

Posted by Matt Kettler <mk...@evi-inc.com>.
Max Clark wrote:
> Hi all,
> 
> After having spamd exit on me a couple of times (still no idea why), I
> decided to put spamd under daemontools control (run file below). While
> this has resulted in the stability I was looking for, I am now
> presented with a number of growing log/spamassassin files - i.e.:
> 
> /service/spamd/razor-agent.log
> /root/.spamassassin/auto-whitelist
> /root/.spamassassin/bayes_journal
> /root/.spamassassin/bayes_seen
> /root/.spamassassin/bayes_toks
> 
> My question breaks into several parts;
> 
> 1. The startup script from the FreeBSD port for spamd ran the service
> as root - is there any reason not to switch spamd to the qpsmtpd
> user/group?

I would not start it as qpsmtpd, as spamd needs privs to bind its port. However,
if you're not doing multi-user bayes, you can start spamd with -u.

Also, be aware.. the above bayes_db in roots home-directory should never be used
by spamd. Spamd always setuid's itself to "nobody" if it finds itself running as
root when being called to scan mail. Normally SA gets started as root, then
setuid's to match the userid that calls spamc. However, if root is calling
spamc, then SA winds up using this safety and setuid'ing to nobody.

If you start spamd with -u, it will setuid to the specified user, without regard
for what user called spamc. This should show up as a bunch of spamds that start
as root, and switch to the -u user when scanning.


> 
> 2. Is there a way I can put the razor-agent.log into multilog? If not,
> how do I rotate this log file?

Can't say as I know. This is generated by the razor tools themselves, so "man
razor-agent.conf" would be the reference here. However, judging from the docs,
this only supports plain "dumb file" access, so there's probably no safe way to
rotate it short of killing off SA.

http://razor.sourceforge.net/docs/doc.php?type=pod&name=razor-agent.conf

> 
> 3. My experiance with Bayes and AWL on amavisd-new is that these files
> will only grow, what is the proper approach to pruning this data?

Bayes_journal, and Bayes_toks should prune on their own during opportunistic
expiry. However, you can run spamassassin --force-expire to force this.

bayes_seen does not get pruned by expiry, but it can be safely deleted in SA
3.1.0 and SA will re-create it. (or so the devels claim, I've not tested this)

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2975

auto-whitelist can be pruned using the check-whitelist script from the tools
directory of the tarball and pass it the --clean parameter. (note: most distro
packages do not install this tool, so just grab it from a tarball download if
you don't have it).


usage: check_whitelist [--clean] [--min n] [dbfile]

"min" will specify the number minimum number of "hits" for a given AWL entry for
it to be considered worth keeping. min defaults to 2 if not specified (this
prunes all "one-off" entries).


> 
> 4. I am considering using an external Mysql database for my Bayes
> database - how much should I expect this to trash my server, and what
> is the proper approach to pruning this data?
> 

Dono, I'm not a SQL-bayes user.