You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/11/02 01:20:26 UTC

[Bug 3225] RFE: Bayes optimizations

http://bugzilla.spamassassin.org/show_bug.cgi?id=3225





------- Additional Comments From spamassassin@dostech.ca  2004-11-01 16:20 -------
>From Michael Parker  2004-04-12 15:00
> 1) Implements Sidney's tok_get_all method for SQL and DBM.  Right now the SQL
> version will get the tokens from the DB in chunks (100, 50, 25, 5, 1) which
> needs to be benchmarked and tweaked based on what works the best.

For MySQL anyway, is there a reason for attempting to cache bayes token queries?
 Every time the atime of a token is updated the cache for the entire bayes_token
table is cleared, so queries are very rarely actually served from cache.

Since these token queries don't benefit from the SQL server cache, there's no
point in caching them (they'll be cleared anyway) and no need to worry about
blowing away the cache (the reason behind bunches I believe).

I've recently timed token queries for about 4600 messages as they are received
by my mail server.  Replacing the fixed bunch sizes with a while loop that
queries up to 100 tokens at a time (I didn't want to exceed any maximum query
lengths) has significantly decreased the amount of time token query takes for an
average message (average of 193 tokens), by about 57%.


Using current bunches:

2295 messages
194 tokens per message average
1.227 seconds per message
0.00630 seconds per token


Using loop, up to 100 tokens at a time:

2302 messages
192 tokens per message average
0.511 seconds per message
0.00266 seconds per token


Using the loop the cache is also cleared every atime update, like above with
bunches.  SQL_NO_CACHE could be inserted into the statement to avoid the
overhead of the unused cache insertion.

I'd imagine other SQL servers would behave similarly, but I'm not familiar with
how other servers (Oracle, Postgres, etc) handle caching, specifically what
causes a tables cache to be cleared.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.