You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/09/11 02:55:57 UTC
[Bug 3771] New: PostgreSQL Specific Bayes Storage Module
http://bugzilla.spamassassin.org/show_bug.cgi?id=3771
Summary: PostgreSQL Specific Bayes Storage Module
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: dev@spamassassin.apache.org
ReportedBy: parkerm@pobox.com
The current SQL implementation is simply not very PostgreSQL friendly, so there
is a need for a more specific implementation.
Part of the work involved is chaning the token column in bayes_token from
char(5) to a bytea type. This means that we have to convert some of the SQL
calls to use bind_param so that we can specify the proper type, for instance:
$sth->bind_param(2, $token, { pg_type => DBD::Pg::PG_BYTEA });
This caused the performance to go from impossible (ie I killed it after about 15
hrs on the first operation in my benchmark) to somewhat livable.
The next part is figuring out how to make it even faster. A few ideas batted
around are:
1) Transactions: turn off autocommit and start a transaction when you tie the DB
and commit when you call untie. PostgreSQL apparently works very well in this
model.
2) Some specific PL/pgSQL code for the _put_token method. It's pretty
complicated and would probably do well to be implemented a little closer to the
database.
Something else to consider, although I can't think of a good way to implement is
a call to vacuum analyze built into the code. I was only able to get sort of
decent performance after I setup a cronjob to run vacuum analyze once a minute
on the bayes tables, pretty sad really.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.