You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/04/03 03:20:59 UTC
[Bug 3225] RFE: Bayes optimizations
http://bugzilla.spamassassin.org/show_bug.cgi?id=3225
------- Additional Comments From parkerm@pobox.com 2004-04-02 17:20 -------
Subject: Re: RFE: Bayes optimizations
On Sun, Mar 28, 2004 at 05:37:14PM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
>
> ------- Additional Comments From sidney@sidney.com 2004-03-28 17:37 -------
> Note that the tok_get_all in the patch queries all of the tokens extracted from
> a message at once without checking if the resulting SELECT is too large for
> MySQL. I did not test with Michael Parker's suggestion of querying 25 at a time.
>
I'm currently finishing up a fairly large hunk of storage (sql and
dbm) optimizations/changes. One is Sidney's tok_get_all with
batching. Another is moving token_count and the newest/oldest token
age values into the bayes_vars table (to avoid some table scans). It
implements a cache to avoid having to go to the database for multiple
items. Removing some dead code and general cleanup.
They span a fairly large range, and will require a schema change. I
hope to have everything finished up this weekend, assuming I don't get
pulled away for something else.
One things of note, I've got what I consider a fairly decent benchmark
script now that I've been using for my testing. Hopefully I can
package it up for others to use as well. It makes comparing changes
and different storage backends very easy.
Michael
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.