You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Thorsten Meinl <Th...@meinl.bnv-bamberg.de> on 2008/12/25 18:54:48 UTC
Bayes-SQL improvements
Hi all,
We have an installation of Spamassassin that serves about 2000 users. Their
Bayes-data is stored inside a Postgres database which is of fairly large
size, the bayes_token table holds about 100 million rows. This often leads to
high loads on the machine, especially if bayes_expire is running. Therefore I
wrote a patch to Spamassassin (3.2.4) that splits the bayes_token table into
several tables. Which user is contained in which table is looked up from
bayes_vars which has an additional column "token_table". New user are
automatically assigned to one table by using their name's CRC32 checksum
(could have been any other but this one was easiest as it gives an int which
can be used to derive a simple number for the token table). This patch lead
to considerably lower loads on the machine and bayes_expire now only takes
about 5 hours instead of 20 before when using 10 instead of 1 table.
The patch is attached, if the developers feel that it is worth integrating
into the distribution, they are free to do so.
Cheers,
Thorsten