You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Michael Parker <pa...@pobox.com> on 2004/06/30 00:30:32 UTC

Bayes Learning/Scanning 2.63 vs 3.0

Howdy,

I've adapted my bayes benchmark[1] to allow me to compare a run
between 2.63 and 3.0, testing just the bayes learning, scanning,
forgetting stuffs.

I found a few interesting things, I rarely offer concrete conclusions
based on data I generate, this case is no different, so feel free to
take anything here with a grain of salt.  Since 2.63 only offers bayes
via DBM files, assume I'm talking about that all the way through.

1) Speed wise, 2.63 and 3.0 are pretty much the same.  Learning is
   about 30% faster under 2.63 but I think I found a reason for this
   (see point 2).

2) 2.63 learned about 30% fewer tokens on the initial learn than 3.0
     did.

3) Size wise the 3.0 database is slightly (2%) bigger than the 2.63,
   but it contains 30% more tokens (see point 2).

One small footnote, I ran the 3.0 spamd in full pre-fork mode with the
--max-conn-per-child set to the default of 200.  Setting it to 1
caused an odd bug and a slowdown in total processing time, I estimate
40% but it is hard to measure due to the buglet.

Michael

[1] The benchmark performs the following:
      Learn 2000 ham
      Learn 2000 spam
      Startup spamd
        Simultaneously run 2000 ham and 2000 spam through via spamc
      Run a --force-expire
      Forget 1000 ham (from the first learn)
      Forget 1000 spam (from the first learn)