You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by Janek Bogucki <ja...@yahoo.co.uk> on 2001/10/16 11:19:14 UTC

Re: Performance dips with a four matches - thanks

Thanks very much for your help. In the early stages of
developing the pattern I actually used (.*) for my
enclosed content but was unaware of SINGLELINE_MASK
and so I used the costly alternation instead.

All the best,
Janek Bogucki

 --- "Daniel F. Savarese" <df...@savarese.org> wrote: > 
> In message
>
<20...@web20901.mail.yahoo.com>,
> =?iso-8859-1?q
> ?Janek=20Bogucki?= writes:
> >Here is a reduced, standalone test class,
> >Performance.java.
> 
> Thanks.  It's a lot easier to see what the pattern
> is now and what
> match attempts are taking the most time.  This is
> the culprit:
> 
> >    /*
> >     * enclosed content
> >     */
> >    "((\\s|.)*)" +
> 
> Alternations are expensive, but much more so when a
> subpattern has the
> potential to match the entire remaining input on
> every backtrack.  Instead
> of this you should use:
> 
>      "(.*)" +
> 
> and compile the pattern with
> Perl5Compiler.SINGLELINE_MASK so that
> . will match newlines:
> 
> _pattern =
> compiler.compile(SERVER_ACTIVE_HTML_PATTERN,
>                            
> Perl5Compiler.CASE_INSENSITIVE_MASK |
>                            
> Perl5Compiler.SINGLELINE_MASK |
>                            
> Perl5Compiler.READ_ONLY_MASK);
> 
> All the matches should now take less than can be
> measured with
> System.currentTimeMillis().
> 
> The problem with writing regular expressions is that
> you often have
> to know something about how they are implemented in
> order to avoid
> writing inefficient ones.  For example, the
> expression you used would
> be fine with the Awk package because AwkCompiler
> will reduce all equivalent
> regular expressions to the same DFA (at the price
> that it only works with
> 8-bit character input).  But the Perl package (and
> Perl itself) treats
> the expressions as NFAs and has to dynamically
> search the state space of
> possible matches rather than relying on
> predetermined state transitions.
> "Mastering Regular Expressions" by Jeffrey Friedl
> does a pretty good job
> of explaining how regular expressions typically work
> and how to avoid
> writing "slow" expressions.  The trick is to avoid
> constructs that will
> cause a lot of backtracking.
> 
> daniel
> 
>  

____________________________________________________________
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie