You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Jacques Amar <ja...@amar.com> on 2008/12/29 20:43:01 UTC

PCRE modules in APR?

Hello,

Sorry if this is already answered elsewhere, couldn't locate it.

   1. Is there already a Perl Compatible Regular Expression (PCRE)
      wrapper for APR? My understanding is that we simply need to take
      over the calls to malloc and free with the provided function
      pointers. malloc seems easy enough to map to a pool. I'm having a
      conceptual problem with the free portion.
   2. I also rolled my own search and replace routines - but the
      performance sucks on large input. Any suggestions on how to
      manipulate strings with the cutting and stitching required? I've
      used an APR_ARRAY push during processing and cat it all together
      once done. perl code doing the same s/// takes way less time.

Let me know if I need to provide code examples.
Thanks

Jacques

Re: PCRE modules in APR?

Posted by Jacques Amar <ja...@amar.com>.
Thanks for the reply.

Wes Garland wrote:
> 1. There is no APR equivalent for free, as it is neither needed nor 
> desired.   Simply allocate your memory from a pool, and destroy the 
> pool when it is no longer needed.  I would suggest making a subpool on 
> RE create and bury it in an opaque pointer describing your RE, if 
> you're actually going to go whole-hog on this. Me?   I use the OS 
> regexec/regcomp  (search only) and register an apr_pool_cleanup 
> handler to avoid leaking memory.
I'm creating a series of pre-compiled/analyzed regex expressions at 
server start up - and doing a lot of S&R during processing. I do create 
a dedicated pool for this, however, I can never destroy it, the 
pre-compiled expression are stored there and should stay there till 
server shutdown. And the PCRE documentation states that I should use one 
memory allocation function  before first usage. I will try to use one 
pool for the regex creations, and another to be used for the search part 
- see if that works.
>
> 2. Personally, I would never roll my own search and replace except 
> under exceptional circumstances. That said, your approach doesn't 
> sound unreasonable, but it's difficult to say what your problem is 
> without profiling the code and looking at memory consumption. Start by 
> consulting the literature, S&R is a well-understood problem; and maybe 
> google some stuff on ropes, they may serve you better than strings.
>
For those interested, I traced the issue to UTF-8 handling- PCRE_UTF8 
flag will significantly slow down the searches. Not all my regexes need 
to have UTF-8 enabled, only those dealing with embedded strings, so I 
shaved a lot of time off by being more selective.

> Here's a paper on ropes which discusses concatenation, which *should* 
> be where you're spending your search and replace time: 
> www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf 
> <http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf>

Will read thanks!  But with UTF-8 out of the way,
output = apr_array_pstrcat ( subpool, strip_arr, 0 );
works perfectly fine and fast.
>
> Note - if your S&R is regexp instead of strcmp, you could also be 
> spending most of your time in the regex state machine. Profile!
>
> Wes
correct!

I guess I now have to deal with my UTF-8 issues.. ugh. I wonder if 
UTF-16 would be faster as all chars are 2 bytes long. I'll also try 
memcached to cache the results so I don't have to do the same processing 
on every request.

Thanks again

Jacques

Re: PCRE modules in APR?

Posted by Wes Garland <we...@page.ca>.
1. There is no APR equivalent for free, as it is neither needed nor
desired.   Simply allocate your memory from a pool, and destroy the pool
when it is no longer needed.  I would suggest making a subpool on RE create
and bury it in an opaque pointer describing your RE, if you're actually
going to go whole-hog on this. Me?   I use the OS regexec/regcomp  (search
only) and register an apr_pool_cleanup handler to avoid leaking memory.

2. Personally, I would never roll my own search and replace except under
exceptional circumstances. That said, your approach doesn't sound
unreasonable, but it's difficult to say what your problem is without
profiling the code and looking at memory consumption. Start by consulting
the literature, S&R is a well-understood problem; and maybe google some
stuff on ropes, they may serve you better than strings.

Here's a paper on ropes which discusses concatenation, which *should* be
where you're spending your search and replace time:
www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

Note - if your S&R is regexp instead of strcmp, you could also be spending
most of your time in the regex state machine. Profile!

Wes

On Mon, Dec 29, 2008 at 2:43 PM, Jacques Amar <ja...@amar.com> wrote:

>  Hello,
>
> Sorry if this is already answered elsewhere, couldn't locate it.
>
>
>    1. Is there already a Perl Compatible Regular Expression (PCRE) wrapper
>    for APR? My understanding is that we simply need to take over the calls to
>    malloc and free with the provided function pointers. malloc seems easy
>    enough to map to a pool. I'm having a conceptual problem with the free
>    portion.
>    2. I also rolled my own search and replace routines - but the
>    performance sucks on large input. Any suggestions on how to manipulate
>    strings with the cutting and stitching required? I've used an APR_ARRAY push
>    during processing and cat it all together once done. perl code doing the
>    same s/// takes way less time.
>
> Let me know if I need to provide code examples.
> Thanks
>
> Jacques
>

Re: PCRE modules in APR?

Posted by Jacques Amar <ja...@amar.com>.
Nick Kew wrote:
>
> On 29 Dec 2008, at 19:43, Jacques Amar wrote:
>
>> Hello,
>>
>> Sorry if this is already answered elsewhere, couldn't locate it.
>>
>> Is there already a Perl Compatible Regular Expression (PCRE) wrapper 
>> for APR? My understanding is that we simply need to take over the 
>> calls to malloc and free with the provided function pointers. malloc 
>> seems easy enough to map to a pool. I'm having a conceptual problem 
>> with the free portion.
>
> No.  There is a PCRE wrapper in httpd, but that just exposes the old
> regexp API.
>
>> I also rolled my own search and replace routines - but the 
>> performance sucks on large input. Any suggestions on how to 
>> manipulate strings with the cutting and stitching required? I've used 
>> an APR_ARRAY push during processing and cat it all together once 
>> done. perl code doing the same s/// takes way less time.
>
> Have you looked at the APR-ified sed code in mod_sed (httpd again)?
>
>> Let me know if I need to provide code examples.
>
> Are you suggesting an APR-ified PCRE is going to yield substantial
> performance benefits, and are you offering to do the work?  If so,
> it could be a worthwhile addition.
>

As I mentioned in another reply, I traced the performance issue to UTF-8.

The only reason I'm using PCRE is that the search expressions are rather 
complex and regex is one way to describe them - and I know regex pretty 
well. The solution I've rolled can be augmented into an APU module with 
general PCRE Search and Replace, if I clean it up. It's using many 
separate pools right now, that I create in advance for (assumed) 
performance reasons.

I'll have to see if I can simplify it and make it more general purpose, 
so it's not tied to my module - maybe using the optional functions or 
the provider API. (need to re-read chapter 10!) I probably will need 
help with the Makefiles, so I'll be back

Sidenote: Love your book! Could use more examples in the chapter 3 about 
APR with more complex examples. For instance, I struggled quite a bit 
with advanced hashes/tables and had to hunt down example code to 
understand it better. Mind you, you learn more by trying.

Thanks

Jacques