You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jose Javier Sianes Ruiz <jo...@juntadeandalucia.es> on 2006/12/21 17:41:54 UTC

Spamassassin and Oracle bayesians DB

Now I’m studding the possibility to build a very large Bayesian database.
Due to a huge amount of user I got (over 100,000 and possibly doubled next
year, with 8MB of Bayesian information each one on theirs Maildirs), I have
discarded use MySQL or PostgreSQL, my only choice now is Oracle. Is it easy
to integrate with Spamassassin? How does it works under heavy mail
concurrency? It seems that row files in bayes_token table will be incredibly
high (150,000 token entries for each user à 15,000,000,000 rows), any
suggestion for building tablespaces? All experiences or comments will be
very appreciated. Thanks for all.

 


Re: Spamassassin and Oracle bayesians DB

Posted by Michael Parker <pa...@pobox.com>.
Jose Javier Sianes Ruiz wrote:
> Now I’m studding the possibility to build a very large Bayesian database.
> Due to a huge amount of user I got (over 100,000 and possibly doubled next
> year, with 8MB of Bayesian information each one on theirs Maildirs), I have
> discarded use MySQL or PostgreSQL, my only choice now is Oracle. Is it easy
> to integrate with Spamassassin? How does it works under heavy mail
> concurrency? It seems that row files in bayes_token table will be incredibly
> high (150,000 token entries for each user à 15,000,000,000 rows), any
> suggestion for building tablespaces? All experiences or comments will be
> very appreciated. Thanks for all.
> 

At the risk of garnering the wrath of Michael Scheidell.....


Oracle for Bayesian databases is lightly tested.  I've never tested it
myself, I know others who have.  It will probably be best to create a
custom storage module for you circumstances.  With a custom module it
would be possible to split up the database in such a way that makes it
a) easier to manage and/or b) high performance.

I'm very interested in general use for Oracle and if you make any
improvements to the existing storage modules I'd be happy to work with
you to get them folded back into the main distribution.

Michael