You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/08/13 19:14:15 UTC
[Bug 5594] New: only (re)sa-compile channel files that have changed
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
Summary: only (re)sa-compile channel files that have changed
Product: Spamassassin
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: sa-compile
AssignedTo: dev@spamassassin.apache.org
ReportedBy: schneecrash+apache@gmail.com
> i use sa-update to update/maintain three separate source channels of rules,
>
> sudo -u spam sa-update --channelfile DIST-ch.conf
> sudo -u spam sa-update --channelfile SARE-ch.conf --gpgkey 856AA88A
> sudo -u spam sa-update --channelfile JMAS-ch.conf --gpgkey 6C6191E3
>
> where, fwiw,
>
> cat DIST-ch.conf
> updates.spamassassin.org
>
> cat SARE-ch.conf
> 70_sare_obfu.cf.sare.sa-update.dostech.net
> 72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
> 70_sare_evilnum0.cf.sare.sa-update.dostech.net
> 70_sare_evilnum1.cf.sare.sa-update.dostech.net
> 70_sare_bayes_poison_nxm.cf.sare.sa-update.dostech.net
> 70_sare_header.cf.sare.sa-update.dostech.net
> 70_sare_header_eng.cf.sare.sa-update.dostech.net
> 99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
> 70_sare_spoof.cf.sare.sa-update.dostech.net
> ...
>
> cat JMAS-ch.conf
> sought.rules.yerp.org
>
> works great manually &/or via cron job.
>
> i've *also* turned on,
>
> # Rule2XSBody - speedup by compilation of ruleset to native code
> loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
>
> in my init.pre.
>
> currently, my sa-update cron-job script detects ANY changes to ANY of the
existing channel files, and if a change exists, RE-compiles the whole set of rules.
>
> as the number of channels managed grows, the odds of a *single* channel being
updated increase, as, then, does the probability that a RE-compile will be done.
>
> inefficient.
>
> i (think i) can simply cobble up a script to only re-compile rules for those
channel files that HAVE been updated/changed, but though i'd ask here first ...
>
> *is* there an already available/clever script or process already available
that would only re-compile those rules that NEED recompiling?
>
From: Justin Mason
Unfortunately -- not yet. It'll take code changes to sa-compile,
specifically to cache the "base strings" somewhere so they don't have to
be re-extracted next time.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From jm@jmason.org 2007-09-18 00:49 -------
> Doing a clean build onite ... is this "in" yet? Or languishing ...?
the latter :(
it's kind of immaterial until we release the next 3.2.x point release anyway.
so far there hasn't been a serious issue (apart from in sa-compile) that would
be fixed, so that's probably why people aren't hurrying...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
sidney@sidney.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 1 votes for 3.2 |ready to commit for 3.2
------- Additional Comments From sidney@sidney.com 2007-12-16 04:36 -------
+1
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2007-12-16 13:40 -------
applied to 3.2.x: r604716
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|REMIND |
------- Additional Comments From jm@jmason.org 2007-10-30 16:18 -------
reopening
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
maddoc@maddoc.net changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 2 votes for 3.2 |needs 1 votes for 3.2
------- Additional Comments From maddoc@maddoc.net 2007-08-19 15:53 -------
+1 works good for me.
SA 3.2.3
real 5m24.817s
user 4m46.614s
sys 0m17.141s
SA 3.2.3 with sa-compile patch
real 3m25.274s
user 2m59.798s
sys 0m17.776s
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|enhancement |major
------- Additional Comments From jm@jmason.org 2007-09-26 03:22 -------
that OOM bug was present in the pre-caching code in 3.2.3. this is therefore a
major fix
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From schneecrash+apache@gmail.com 2007-08-19 15:47 -------
hi,
> this is already in 3.3.0 trunk; this patch is for 3.2.x. Please apply it and
> let me know if it works...
w/ co of 32x-branch, r567485 + patch(sa_bug5594_patch4098.txt)
now, @ sa-compile, in the compile/debug output, I see the requisite 'evidence'
of caching, e.g.,
...
[16751] dbg: zoom: YES (cached) /\doo[o0] d[o0][l|][l|]ars/i
[16751] dbg: zoom: NO (cached)
/[\s-][a-z01]{1,10}[1|]ve[a-z01]{0,10}[,.?!]*[\s-]/i
...
and both sa-compile & --lint of my full/production ruleset are error-free.
so, that's good.
the sa-compile process is, apparently, faster, now. on *this* ancient-box (an
old 333MHz, 832Mb Mac/G3! Hey, I'm at the cottage ... life's slower here ...)
the full ruleset compile-time dropped from ~42 mins to ~22 mins.
Now, iiuc, this patch speeds up compilation, but it is still re-compiling "all"
the in-place rules, yes?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|only (re)sa-compile channel |[review] speed up sa-compile
|files that have changed |
Status Whiteboard| |needs 2 votes for 3.2
Target Milestone|3.3.0 |3.2.3
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[review] speed up sa-compile|speed up sa-compile
Status Whiteboard|needs 1 votes for 3.2 |needs update
------- Additional Comments From jm@jmason.org 2007-09-19 05:20 -------
stop the presses -- this needs to be updated to include the fix from r577272,
which fixes an OOM:
: jm 25...; svn commit -m "avoid massive memory blow-up in sa-compile; seems
/\b/ isn't the right thing to use when matching the list of already-subsumed
base string names" lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Transmitting file data .
Committed revision 577272.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] only (re)sa-compile channel files that have changed
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From jm@jmason.org 2007-08-19 14:21 -------
Created an attachment (id=4098)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4098&action=view)
implementation
OK, here's what I was thinking; it caches the base strings between runs, and
optimizes the "hot spot" in the second half of the algorithm. Results are that
an sa-compile of the default ruleset drops from 35 seconds to 11 secs on my
laptop here -- triple the speed. ;)
this is already in 3.3.0 trunk; this patch is for 3.2.x. Please apply it and
let me know if it works...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From jm@jmason.org 2007-12-03 08:29 -------
(In reply to comment #16)
> Justin -- I don't see anything in the patch that avoids using cached data for a
> rule that has been updated/changed since the cache data was created (which must
> be done). Did I miss something?
sorry Daryl -- forgot to reply. :(
the cache's key is made up of both the rule's name, and its value -- ie. the
rule regexp. so if the rule changes, the cache is invalidated.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|3.2.3 |3.2.4
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] only (re)sa-compile channel files that have changed
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |3.3.0
------- Additional Comments From jm@jmason.org 2007-08-13 10:33 -------
yep -- what I'm considering is caching the base strings from previous sa-compile
runs, so that later sa-compiles can reload them from cache and save a lot of
needless recomputation.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From schneecrash+apache@gmail.com 2007-09-17 20:42 -------
(In reply to comment #7)
> yep, it still needs votes.
Doing a clean build onite ... is this "in" yet? Or languishing ...?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From schneecrash+apache@gmail.com 2007-09-18 08:11 -------
(In reply to comment #9)
> :(
i hear ya'
> it's kind of immaterial until we release the next 3.2.x point release anyway.
> so far there hasn't been a serious issue (apart from in sa-compile) that would
> be fixed,
sure. just a little inconvenient to figure out what's "in" or "not yet ..." with
each build, especially when it's effectively 'done' -- which, of course, is the
good news --> it works great!
and,if/once approved & committed, more folks would be aware of it, then use it,
then appreciate ... jada jada jada
but nbd :-) too much 'philosophy', and not telling you something you don't
already know!
> so that's probably why people aren't hurrying...
heh. we may have slightly different definitions of "hurrying" ;-)
thx!
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From spamassassin@dostech.ca 2007-11-06 12:30 -------
Justin -- I don't see anything in the patch that avoids using cached data for a
rule that has been updated/changed since the cache data was created (which must
be done). Did I miss something?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
spamassassin@dostech.ca changed:
What |Removed |Added
----------------------------------------------------------------------------
Status Whiteboard|needs 2 votes for 3.2 |needs 1 votes for 3.2
------- Additional Comments From spamassassin@dostech.ca 2007-12-03 08:45 -------
cool, +1
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #4098 is|0 |1
obsolete| |
------- Additional Comments From jm@jmason.org 2007-09-24 13:00 -------
Created an attachment (id=4132)
--> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4132&action=view)
updated patch
here's an update of patch 4098, to match current SVN trunk.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From jm@jmason.org 2007-08-27 08:14 -------
yep, it still needs votes.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From schneecrash+apache@gmail.com 2007-08-27 08:00 -------
just checking.
this patch in/out for 32x branch?
looks like it still needs votes from committers ...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
schneecrash+apache@gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |REMIND
------- Additional Comments From schneecrash+apache@gmail.com 2007-10-30 15:51 -------
just a bump ...
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] only (re)sa-compile channel files that have changed
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
------- Additional Comments From schneecrash+apache@gmail.com 2007-08-15 13:09 -------
myriad options, of course, but i'm guessing you're looking atsomething
persistent ...
http://perl-cache.sourceforge.net perhaps? other?
just curious.
cheeers!
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 5594] [review] speed up sa-compile
Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|speed up sa-compile |[review] speed up sa-compile
Status Whiteboard|needs update |needs 2 votes for 3.2
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.