You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2006/11/10 18:05:43 UTC
Re: 3.2.0?
Kevin A. McGrail writes:
> > hey -- anyone think we should consider getting 3.2.0 out before January? I
> > think it may be doable.
> >
> > The one major feature I want to get in is the re2c/sa-compile speedup code
> > in the side branch -- it provides about a 20% speedup of scanning by
> > compiling parts of the ruleset into native code, which is nice. ;)
>
> I would like to see it be released before January. The 20% speedup sounds
> amazing especially because I see more and more rules each day. Is there any
> reduced RAM usage as well? I assume there is 20% less CPU usage just
> because it finishes quicker.
Yep, CPU time goes down by a similar amount (SpamAssassin is generally
CPU-bound).
However I don't think it really helps RAM usage; it probably increases it
a little, unfortunately. I agree reducing RAM usage is important though,
esp nowadays that the RAM-to-CPU bandwidth is becoming even more of a
bottleneck than CPU time... need to look into this more.
Here are some timings, btw. I tested it on a couple of weeks of my corpus
-- 3395 hams and 15795 spams -- using perl 5.8.8, mass-check, and the
latest SVN trunk ruleset including sandbox rules.
Without rule2xs active:
real avg=2037.131s min=2032.047s max=2045.501s count=3
user avg=1884.417s min=1881.802s max=1887.930s count=3
sys avg=29.990s min=28.354s max=31.446s count=3
that's (19190 / 2037.131) = 9.42 messages/sec.
With the compiled ruleset:
real avg=1781.106s min=1769.190s max=1797.974s count=4
user avg=1637.173s min=1633.754s max=1640.727s count=4
sys avg=27.706s min=22.197s max=31.578s count=4
= 10.77 messages/sec, about a 14% speedup. (It varies depending on what
rules are loaded and what mail is scanned, btw, hence 14 != 20.)
> On a similar topic, perhaps, I have been contemplating if the compilation to
> native code could do something to not require ?: on every () regexp. I
> find that A) I'm lazy on adding them and B) they can get insane on trying to
> read and debug some of the more complex rules.
Yeah -- it'll do this automatically. However it's an optional plugin,
and most people will probably not be using it -- so it can't be
counted on being loaded :(
for what it's worth, we should be extending --lint to warn about these--
that would make it pretty clear when it needs to be fixed I think.
> I've been talking with Mark Damrose about this and since you have to use \\1
> \\2, for the replacements, could the "re2c/sa-compile" be changed to
> additionally automatically add ?: to regexp without \\1, etc.? This should
> save a little on RAM and overhead, though I'm not sure how much really.
hmm, unfortunately \1 and so on are too advanced for the rule2xs compiler;
it'll leave those rules as non-compiled body rules. Unfortunately re2c
isn't up to the full perl regexp vocabulary -- despite the sterling work
that Matt Sergeant has done in writing the compiler code to translate much
of it, there's still a lot of flexibility in perl's regexps that don't
translate to the re2c model (something to do with DFAs vs NFAs I think ;)
(oh yeah -- credit where due -- Matt is the guy who wrote much of this,
esp the rule2xs code which translates perl regexps into re2c in the form
of a perl XS module. My hacking is mostly glue ;)
--j.