You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Thorsten Meinl <Th...@meinl.bnv-bamberg.de> on 2008/12/25 18:54:48 UTC

Bayes-SQL improvements

Hi all,

We have an installation of Spamassassin that serves about 2000 users. Their 
Bayes-data is stored inside a Postgres database which is of fairly large 
size, the bayes_token table holds about 100 million rows. This often leads to 
high loads on the machine, especially if bayes_expire is running. Therefore I 
wrote a patch to Spamassassin (3.2.4) that splits the bayes_token table into 
several tables. Which user is contained in which table is looked up from 
bayes_vars which has an additional column "token_table". New user are 
automatically assigned to one table by using their name's CRC32 checksum 
(could have been any other but this one was easiest as it gives an int which 
can be used to derive a simple number for the token table). This patch lead 
to considerably lower loads on the machine and bayes_expire now only takes 
about 5 hours instead of 20 before when using 10 instead of 1 table.
The patch is attached, if the developers feel that it is worth integrating 
into the distribution, they are free to do so.

Cheers,

Thorsten