You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/11/04 20:29:15 UTC
[Bug 3951] New: Possible tok_get_all optimization
http://bugzilla.spamassassin.org/show_bug.cgi?id=3951
Summary: Possible tok_get_all optimization
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P5
Component: Learner
AssignedTo: dev@spamassassin.apache.org
ReportedBy: parkerm@pobox.com
Based on comments from Bug 3225 it may be possible to optimize tok_get_all a
good bit.
The basics are, instead of using fixed queries in bunch sizes to fetch all of
the tokens from the database, use dynamic queries. This in theory works out ok
because due to atime updates the query cache is pretty much invalid anyway. We
still need to limit the number of tokens fetched due to SQL statement limits but
it should be tunable.
FWIW, I tried this (with a limit of 100) using my Bayes Benchmark (which scans
8000 ham and spam msgs with spamc/spamd and spamassassin) and found that it
causes a very small increase (4% at the most), nothing like what the original
comment states (57%). Using a limit of 200 causes a slowdown in some cases.
I think we need to do so analysis and determine the optimal combination here.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.