You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/03 03:20:59 UTC

[Bug 3225] RFE: Bayes optimizations

http://bugzilla.spamassassin.org/show_bug.cgi?id=3225





------- Additional Comments From parkerm@pobox.com  2004-04-02 17:20 -------
Subject: Re:  RFE: Bayes optimizations

On Sun, Mar 28, 2004 at 05:37:14PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> 
> ------- Additional Comments From sidney@sidney.com  2004-03-28 17:37 -------
> Note that the tok_get_all in the patch queries all of the tokens extracted from
> a message at once without checking if the resulting SELECT is too large for
> MySQL. I did not test with Michael Parker's suggestion of querying 25 at a time.
> 

I'm currently finishing up a fairly large hunk of storage (sql and
dbm) optimizations/changes.  One is Sidney's tok_get_all with
batching. Another is moving token_count and the newest/oldest token
age values into the bayes_vars table (to avoid some table scans).  It
implements a cache to avoid having to go to the database for multiple
items. Removing some dead code and general cleanup.

They span a fairly large range, and will require a schema change.  I
hope to have everything finished up this weekend, assuming I don't get
pulled away for something else.

One things of note, I've got what I consider a fairly decent benchmark
script now that I've been using for my testing. Hopefully I can
package it up for others to use as well.  It makes comparing changes
and different storage backends very easy.

Michael





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.