You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt Kettler <mk...@evi-inc.com> on 2005/04/01 00:16:25 UTC

Re: Rule Design Benchmark/Resource Question

Rocky Olsen wrote:

>Before i pull my hair out doing bench/resource test, i was wondering if
>anyone out there knew if there was much of a speed/resource usage
>difference between the following way of writing the same rule.
>
>
>Method A:
>body	rule_a		/(?:feh|meh|bleh)/i
>
>vs.
>
>Method B:
>
>bod		__rule_a	/(?:feh)/i
>body	__rule_b	/(?:meh)/i
>body	__rule_c	/(?:bleh)/i
>
>meta	rule_d		(__rule_a || __rule_b || __rule_c)
>
>
>There probably isn't much difference using just 3 rules, but i'm thinking
>more along the lines of large(500+) lists and it isn't limited to just body
>stuff.  So if anyone has some realworld benching/experience with what is
>preferred or if the developers know which is faster for SA, i would love
>the input.
>  
>

To start with, use perl's regex debugger as your friend:

$perl -Mre=debug -e  "/(?:feh|meh|bleh)/i"
size 11 Got 92 bytes for offset annotations.

$ perl -Mre=debug -e  "/(?:feh)/i"
Freeing REx: `","'
Compiling REx `(?:feh)'
size 3 Got 28 bytes for offset annotations.

(repeat 2 times)

However, this only deals with part of the story. The cost of the regex
itself. It does not deal with the per-rule overhead in SA.

In general I'd favor the combined approach, unless for some reason your
combined rule is considerably larger than the sum of it's parts. Bigevil
ran much better once Chris S did some combining and common subexpression
elimination.




Also, I'd suggest eliminating the (?:) for the single-text-matches. It
does nothing of use, and doesn't change the evaluation of the regex any
for a simple single text match. All it does is waste 4 bytes of disk
space per rule.

body __RULE_A   /feh/i

instead of:
body __RULE_A   /(?:feh)/i

I leave comparing the two using re=debug as an exercise for the student.
Also compare to /(feh)/i and /(feh)\1/i to see how backtracking works.








Re: Rule Design Benchmark/Resource Question

Posted by Rocky Olsen <ro...@mindphone.org>.
Thanks

On Thu, Mar 31, 2005 at 05:16:25PM -0500, Matt Kettler wrote:
> Rocky Olsen wrote:
> 
> >Before i pull my hair out doing bench/resource test, i was wondering if
> >anyone out there knew if there was much of a speed/resource usage
> >difference between the following way of writing the same rule.
> >
> >
> >Method A:
> >body	rule_a		/(?:feh|meh|bleh)/i
> >
> >vs.
> >
> >Method B:
> >
> >bod		__rule_a	/(?:feh)/i
> >body	__rule_b	/(?:meh)/i
> >body	__rule_c	/(?:bleh)/i
> >
> >meta	rule_d		(__rule_a || __rule_b || __rule_c)
> >
> >
> >There probably isn't much difference using just 3 rules, but i'm thinking
> >more along the lines of large(500+) lists and it isn't limited to just body
> >stuff.  So if anyone has some realworld benching/experience with what is
> >preferred or if the developers know which is faster for SA, i would love
> >the input.
> >  
> >
> 
> To start with, use perl's regex debugger as your friend:
> 
> $perl -Mre=debug -e  "/(?:feh|meh|bleh)/i"
> size 11 Got 92 bytes for offset annotations.
> 
> $ perl -Mre=debug -e  "/(?:feh)/i"
> Freeing REx: `","'
> Compiling REx `(?:feh)'
> size 3 Got 28 bytes for offset annotations.
> 
> (repeat 2 times)
> 
> However, this only deals with part of the story. The cost of the regex
> itself. It does not deal with the per-rule overhead in SA.
> 
> In general I'd favor the combined approach, unless for some reason your
> combined rule is considerably larger than the sum of it's parts. Bigevil
> ran much better once Chris S did some combining and common subexpression
> elimination.
> 
> 
> 
> 
> Also, I'd suggest eliminating the (?:) for the single-text-matches. It
> does nothing of use, and doesn't change the evaluation of the regex any
> for a simple single text match. All it does is waste 4 bytes of disk
> space per rule.
> 
> body __RULE_A   /feh/i
> 
> instead of:
> body __RULE_A   /(?:feh)/i
> 
> I leave comparing the two using re=debug as an exercise for the student.
> Also compare to /(feh)/i and /(feh)\1/i to see how backtracking works.
> 
> 
> 
> 
> 
> 
> 

-- 
______________________________________________________________________


what's with today, today?

Email:	rocky@mindphone.org
PGP:	http://rocky.mindphone.org/rocky_mindphone.org.gpg