You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2011/11/29 23:22:50 UTC
Martin Gregorie's portmanteau rule building script
On 11/25/2011 10:13 AM, Martin Gregorie wrote:
> Subject: [Fwd: Re: How long a rule can be?]
My main answers to the original thread were posted there (today). I
guess I'm too accustomed to orderly threads; coupling my threaded view
in thunderbird with the big pile of mail unread since before the holiday
and I missed this thread when responding to the original.
If you want to fork the thread into a tangent, please change the subject
so other responses to it don't follow you. Also, don't respond to the
parts of the thread you are not forking; those belong in another message
in the original thread.
</rant>
> If you're finding your rule is starting to get difficult to maintain,
> take a look at my rule assembly tool, which is designed to allow such
> rules to be defined in an easily edited file for each rule that are
> used to create a single .cf file. See:
> http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz
>
> I was thinking of using a server plus plugin to do this but was
> convinced that this 'portmanteau rule' approach was better: it
> certainly works well for me.
You might want to consider Regexp::Assemble for your tool, though that
would require using perl. This would cause your man page's example rule
to result in something like this:
body __AU0 /(?i-xsm:\balt[123]\b)/
rather than your script's *much* slower:
body __AU0 /\b(alt1|alt2|alt3)\b/i
Re: Martin Gregorie's portmanteau rule building script
Posted by Adam Katz <an...@khopis.com>.
On 11/30/2011 03:59 AM, Martin Gregorie wrote:
> On Tue, 2011-11-29 at 14:22 -0800, Adam Katz wrote:
>> You might want to consider Regexp::Assemble for your tool, though
>> that would require using perl. This would cause your man page's
>> example rule to result in something like this:
>>
>> body __AU0 /(?i-xsm:\balt[123]\b)/
>>
>> rather than your script's *much* slower:
>>
>> body __AU0 /\b(alt1|alt2|alt3)\b/i
>>
> Interesting idea. Currently my system's performance seems 'adequate',
> considering I'm running SA on an 866 mHz P3 box with 512 MB RAM:
> Min Avg Max
> Scan times: 0.9 ( 3401 bytes) 4.0 128.3 ( 72858 bytes)
> Msg sizes: 2258 ( 1.8 secs ) 10474 507533 ( 6.2 secs )
> Messages: 2032
>
> What sort of speed-up would Regexp::Assemble provide?
> How would that compare with compiling the portmanteau.cf file?
Great question. I do not have an answer.
How much optimization does re2c provide? I am under the impression all
it does is convert text-based PCREs to C/C++ code of some sort, which
fully(?) mimics the original regexp's logic, implying that optimization
before compilation matters a lot.
I popped into irc://freenode.net#regex to ask, but this is apparently
too archaic a question. Maybe somebody will have an answer in time. (I
am not motivated enough to create an impromptu benchmark suite myself.)
Re: Martin Gregorie's portmanteau rule building script
Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2011-11-29 at 14:22 -0800, Adam Katz wrote:
> If you want to fork the thread into a tangent, please change the subject
> so other responses to it don't follow you. Also, don't respond to the
> parts of the thread you are not forking; those belong in another message
> in the original thread.
>
That wasn't my intention. I *thought* I was merely adding an aside to
say "if you really want rules with lots of alternates, here's a tool
that can help" because I think we've all all struggled with rules that
straggle off the right edge of the page with many editors. I know vi/vim
will wrap those lines, but a lot of people dislike vi.
> You might want to consider Regexp::Assemble for your tool, though that
> would require using perl. This would cause your man page's example rule
> to result in something like this:
>
> body __AU0 /(?i-xsm:\balt[123]\b)/
>
> rather than your script's *much* slower:
>
> body __AU0 /\b(alt1|alt2|alt3)\b/i
>
Interesting idea. Currently my system's performance seems 'adequate',
considering I'm running SA on an 866 mHz P3 box with 512 MB RAM:
Min Avg Max
Scan times: 0.9 ( 3401 bytes) 4.0 128.3 ( 72858 bytes)
Msg sizes: 2258 ( 1.8 secs ) 10474 507533 ( 6.2 secs )
Messages: 2032
What sort of speed-up would Regexp::Assemble provide?
How would that compare with compiling the portmanteau.cf file?
Martin