You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/08/13 19:14:15 UTC

[Bug 5594] New: only (re)sa-compile channel files that have changed

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594

           Summary: only (re)sa-compile channel files that have changed
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: sa-compile
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: schneecrash+apache@gmail.com


> i use sa-update to update/maintain three separate source channels of rules,
> 
> 	sudo -u spam sa-update --channelfile DIST-ch.conf
> 	sudo -u spam sa-update --channelfile SARE-ch.conf --gpgkey 856AA88A
> 	sudo -u spam sa-update --channelfile JMAS-ch.conf --gpgkey 6C6191E3
> 
> where, fwiw,
> 
> 	cat DIST-ch.conf
> 		updates.spamassassin.org
> 	
> 	cat SARE-ch.conf
> 		70_sare_obfu.cf.sare.sa-update.dostech.net
> 		72_sare_redirect_post3.0.0.cf.sare.sa-update.dostech.net
> 		70_sare_evilnum0.cf.sare.sa-update.dostech.net
> 		70_sare_evilnum1.cf.sare.sa-update.dostech.net
> 		70_sare_bayes_poison_nxm.cf.sare.sa-update.dostech.net
> 		70_sare_header.cf.sare.sa-update.dostech.net
> 		70_sare_header_eng.cf.sare.sa-update.dostech.net
> 		99_sare_fraud_post25x.cf.sare.sa-update.dostech.net
> 		70_sare_spoof.cf.sare.sa-update.dostech.net
> 		...
> 	
> 	cat JMAS-ch.conf
> 		sought.rules.yerp.org
> 
> works great manually &/or via cron job.
> 
> i've *also* turned on,
> 
> 	# Rule2XSBody - speedup by compilation of ruleset to native code
> 	loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
> 
> in my init.pre.
> 
> currently, my sa-update cron-job script detects ANY changes to ANY of the
existing channel files, and if a change exists, RE-compiles the whole set of rules.
> 
> as the number of channels managed grows, the odds of a *single* channel being
updated increase, as, then, does the probability that a RE-compile will be done.
> 
> inefficient.
> 
> i (think i) can simply cobble up a script to only re-compile rules for those
channel files that HAVE been updated/changed, but though i'd ask here first ...
> 
> *is* there an already available/clever script or process already available
that would only re-compile those rules that NEED recompiling?
> 


From: Justin Mason
 Unfortunately -- not yet.   It'll take code changes to sa-compile,
 specifically to cache the "base strings" somewhere so they don't have to
 be re-extracted next time.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From jm@jmason.org  2007-09-18 00:49 -------
> Doing a clean build onite ... is this "in" yet? Or languishing ...?

the latter :(

it's kind of immaterial until we release the next 3.2.x point release anyway. 
so far there hasn't been a serious issue (apart from in sa-compile) that would
be fixed, so that's probably why people aren't hurrying...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 1 votes for 3.2       |ready to commit for 3.2




------- Additional Comments From sidney@sidney.com  2007-12-16 04:36 -------
+1




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2007-12-16 13:40 -------
applied to 3.2.x: r604716



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|REMIND                      |




------- Additional Comments From jm@jmason.org  2007-10-30 16:18 -------
reopening



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


maddoc@maddoc.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 2 votes for 3.2       |needs 1 votes for 3.2




------- Additional Comments From maddoc@maddoc.net  2007-08-19 15:53 -------
+1 works good for me.

SA 3.2.3
real    5m24.817s
user    4m46.614s
sys     0m17.141s
SA 3.2.3 with sa-compile patch
real    3m25.274s
user    2m59.798s
sys     0m17.776s





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |major




------- Additional Comments From jm@jmason.org  2007-09-26 03:22 -------
that OOM bug was present in the pre-caching code in 3.2.3.  this is therefore a
major fix



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From schneecrash+apache@gmail.com  2007-08-19 15:47 -------
hi,

> this is already in 3.3.0 trunk; this patch is for 3.2.x.  Please apply it and
> let me know if it works...

w/ co of 32x-branch, r567485 + patch(sa_bug5594_patch4098.txt)

now, @ sa-compile, in the compile/debug output, I see the requisite 'evidence'
of caching, e.g.,

  ...
  [16751] dbg: zoom: YES (cached) /\doo[o0] d[o0][l|][l|]ars/i
  [16751] dbg: zoom: NO (cached)
/[\s-][a-z01]{1,10}[1|]ve[a-z01]{0,10}[,.?!]*[\s-]/i
  ...

and both sa-compile & --lint of my full/production ruleset are error-free.

so, that's good.

the sa-compile process is, apparently, faster, now.  on *this* ancient-box (an
old 333MHz, 832Mb Mac/G3! Hey, I'm at the cottage ... life's slower here ...)
the full ruleset compile-time dropped from ~42 mins to ~22 mins.

Now, iiuc, this patch speeds up compilation, but it is still re-compiling "all"
the in-place rules, yes?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|only (re)sa-compile channel |[review] speed up sa-compile
                   |files that have changed     |
  Status Whiteboard|                            |needs 2 votes for 3.2
   Target Milestone|3.3.0                       |3.2.3






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[review] speed up sa-compile|speed up sa-compile
  Status Whiteboard|needs 1 votes for 3.2       |needs update




------- Additional Comments From jm@jmason.org  2007-09-19 05:20 -------
stop the presses -- this needs to be updated to include the fix from r577272,
which fixes an OOM:

: jm 25...; svn commit -m "avoid massive memory blow-up in sa-compile; seems
/\b/ isn't the right thing to use when matching the list of already-subsumed
base string names" lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Sending        lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm
Transmitting file data .
Committed revision 577272.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] only (re)sa-compile channel files that have changed

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From jm@jmason.org  2007-08-19 14:21 -------
Created an attachment (id=4098)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4098&action=view)
implementation

OK, here's what I was thinking; it caches the base strings between runs, and
optimizes the "hot spot" in the second half of the algorithm.  Results are that
an sa-compile of the default ruleset drops from 35 seconds to 11 secs on my
laptop here -- triple the speed. ;)

this is already in 3.3.0 trunk; this patch is for 3.2.x.  Please apply it and
let me know if it works...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From jm@jmason.org  2007-12-03 08:29 -------
(In reply to comment #16)
> Justin -- I don't see anything in the patch that avoids using cached data for a
> rule that has been updated/changed since the cache data was created (which must
> be done).  Did I miss something?

sorry Daryl -- forgot to reply. :(

the cache's key is made up of both the rule's name, and its value -- ie. the
rule regexp.  so if the rule changes, the cache is invalidated.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.2.3                       |3.2.4






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] only (re)sa-compile channel files that have changed

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |3.3.0




------- Additional Comments From jm@jmason.org  2007-08-13 10:33 -------
yep -- what I'm considering is caching the base strings from previous sa-compile
runs, so that later sa-compiles can reload them from cache and save a lot of
needless recomputation.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From schneecrash+apache@gmail.com  2007-09-17 20:42 -------
(In reply to comment #7)
> yep, it still needs votes.

Doing a clean build onite ... is this "in" yet? Or languishing ...?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From schneecrash+apache@gmail.com  2007-09-18 08:11 -------
(In reply to comment #9)

> :(

i hear ya'

> it's kind of immaterial until we release the next 3.2.x point release anyway. 
> so far there hasn't been a serious issue (apart from in sa-compile) that would
> be fixed,

sure. just a little inconvenient to figure out what's "in" or "not yet ..." with
each build, especially when it's effectively 'done' -- which, of course, is the
good news --> it works great!

and,if/once approved & committed, more folks would be aware of it, then use it,
then appreciate ... jada jada jada

but nbd :-) too much 'philosophy', and not telling you something you don't
already know!

> so that's probably why people aren't hurrying...

heh. we may have slightly different definitions of "hurrying" ;-)

thx!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From spamassassin@dostech.ca  2007-11-06 12:30 -------
Justin -- I don't see anything in the patch that avoids using cached data for a
rule that has been updated/changed since the cache data was created (which must
be done).  Did I miss something?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Status Whiteboard|needs 2 votes for 3.2       |needs 1 votes for 3.2




------- Additional Comments From spamassassin@dostech.ca  2007-12-03 08:45 -------
cool, +1



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #4098 is|0                           |1
           obsolete|                            |




------- Additional Comments From jm@jmason.org  2007-09-24 13:00 -------
Created an attachment (id=4132)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=4132&action=view)
updated patch

here's an update of patch 4098, to match current SVN trunk.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From jm@jmason.org  2007-08-27 08:14 -------
yep, it still needs votes.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From schneecrash+apache@gmail.com  2007-08-27 08:00 -------
just checking.

this patch in/out for 32x branch?

looks like it still needs votes from committers ...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


schneecrash+apache@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |REMIND




------- Additional Comments From schneecrash+apache@gmail.com  2007-10-30 15:51 -------
just a bump ...



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] only (re)sa-compile channel files that have changed

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594





------- Additional Comments From schneecrash+apache@gmail.com  2007-08-15 13:09 -------
myriad options, of course, but i'm guessing you're looking atsomething
persistent ...

http://perl-cache.sourceforge.net perhaps? other?

just curious.

cheeers!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5594] [review] speed up sa-compile

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5594


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|speed up sa-compile         |[review] speed up sa-compile
  Status Whiteboard|needs update                |needs 2 votes for 3.2






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.