You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Michael Parker <pa...@pobox.com> on 2004/06/30 00:30:32 UTC
Bayes Learning/Scanning 2.63 vs 3.0
Howdy,
I've adapted my bayes benchmark[1] to allow me to compare a run
between 2.63 and 3.0, testing just the bayes learning, scanning,
forgetting stuffs.
I found a few interesting things, I rarely offer concrete conclusions
based on data I generate, this case is no different, so feel free to
take anything here with a grain of salt. Since 2.63 only offers bayes
via DBM files, assume I'm talking about that all the way through.
1) Speed wise, 2.63 and 3.0 are pretty much the same. Learning is
about 30% faster under 2.63 but I think I found a reason for this
(see point 2).
2) 2.63 learned about 30% fewer tokens on the initial learn than 3.0
did.
3) Size wise the 3.0 database is slightly (2%) bigger than the 2.63,
but it contains 30% more tokens (see point 2).
One small footnote, I ran the 3.0 spamd in full pre-fork mode with the
--max-conn-per-child set to the default of 200. Setting it to 1
caused an odd bug and a slowdown in total processing time, I estimate
40% but it is hard to measure due to the buglet.
Michael
[1] The benchmark performs the following:
Learn 2000 ham
Learn 2000 spam
Startup spamd
Simultaneously run 2000 ham and 2000 spam through via spamc
Run a --force-expire
Forget 1000 ham (from the first learn)
Forget 1000 spam (from the first learn)