You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/11/04 20:29:15 UTC

[Bug 3951] New: Possible tok_get_all optimization

http://bugzilla.spamassassin.org/show_bug.cgi?id=3951

           Summary: Possible tok_get_all optimization
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Learner
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: parkerm@pobox.com


Based on comments from Bug 3225 it may be possible to optimize tok_get_all a
good bit.

The basics are, instead of using fixed queries in bunch sizes to fetch all of
the tokens from the database, use dynamic queries.  This in theory works out ok
because due to atime updates the query cache is pretty much invalid anyway.  We
still need to limit the number of tokens fetched due to SQL statement limits but
it should be tunable.

FWIW, I tried this (with a limit of 100) using my Bayes Benchmark (which scans
8000 ham and spam msgs with spamc/spamd and spamassassin) and found that it
causes a very small increase (4% at the most), nothing like what the original
comment states (57%).  Using a limit of 200 causes a slowdown in some cases.

I think we need to do so analysis and determine the optimal combination here.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.