You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jason Frisvold <xe...@gmail.com> on 2006/11/27 23:01:40 UTC
Bayes - Optimizing the database
Greetings,
After struggling a bit with Bayes in general and trying to figure out
a way to make things run a bit faster, I've done some serious digging
and I want to clarify a few things before I make a mess of my Bayes
DB...
I have everything currently set up to use a MySQL database. The
bayes_token table is about 3GB in size and tends to be the slowest
link in the system. (AWL isn't too far behind, but I think I have a
viable strategy for dealing with that monster)
First, some quick assumptions. Please correct me if I'm wrong.
All of the bayes_ tables are directly related via the id field.
bayes_token contains the actual tokens for bayesian processing and
bayes_seen contains the message ids of messages bayes has already
processed for tokens, presumably to reduce cpu usage? I *think*
bayes_vars merely contains the magic data used by bayes, and I have no
idea what bayes_expire is for. Am I correct thus far?
Now, given that, I can directly map my users to an entry in bayes_vars
and identify their "id". With that, I can purge non-existant users
from the system. Simple enough.
Now, for other users, can I trust the last_expire field in bayes_vars
and formulate something to force-expire at periodic intervals based on
that value? I realize that spamc/spamd already expire when necessary,
but I think I'd rather run this on a nightly basis during off-peak
hours, and serialize it so that only a single user is being expired at
a time. Is that a reasonable move to reduce overall cpu usage on the
system?
Thanks!
--
Jason 'XenoPhage' Frisvold
XenoPhage0@gmail.com