You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by Janek Bogucki <ja...@yahoo.co.uk> on 2001/10/16 11:19:14 UTC
Re: Performance dips with a four matches - thanks
Thanks very much for your help. In the early stages of
developing the pattern I actually used (.*) for my
enclosed content but was unaware of SINGLELINE_MASK
and so I used the costly alternation instead.
All the best,
Janek Bogucki
--- "Daniel F. Savarese" <df...@savarese.org> wrote: >
> In message
>
<20...@web20901.mail.yahoo.com>,
> =?iso-8859-1?q
> ?Janek=20Bogucki?= writes:
> >Here is a reduced, standalone test class,
> >Performance.java.
>
> Thanks. It's a lot easier to see what the pattern
> is now and what
> match attempts are taking the most time. This is
> the culprit:
>
> > /*
> > * enclosed content
> > */
> > "((\\s|.)*)" +
>
> Alternations are expensive, but much more so when a
> subpattern has the
> potential to match the entire remaining input on
> every backtrack. Instead
> of this you should use:
>
> "(.*)" +
>
> and compile the pattern with
> Perl5Compiler.SINGLELINE_MASK so that
> . will match newlines:
>
> _pattern =
> compiler.compile(SERVER_ACTIVE_HTML_PATTERN,
>
> Perl5Compiler.CASE_INSENSITIVE_MASK |
>
> Perl5Compiler.SINGLELINE_MASK |
>
> Perl5Compiler.READ_ONLY_MASK);
>
> All the matches should now take less than can be
> measured with
> System.currentTimeMillis().
>
> The problem with writing regular expressions is that
> you often have
> to know something about how they are implemented in
> order to avoid
> writing inefficient ones. For example, the
> expression you used would
> be fine with the Awk package because AwkCompiler
> will reduce all equivalent
> regular expressions to the same DFA (at the price
> that it only works with
> 8-bit character input). But the Perl package (and
> Perl itself) treats
> the expressions as NFAs and has to dynamically
> search the state space of
> possible matches rather than relying on
> predetermined state transitions.
> "Mastering Regular Expressions" by Jeffrey Friedl
> does a pretty good job
> of explaining how regular expressions typically work
> and how to avoid
> writing "slow" expressions. The trick is to avoid
> constructs that will
> cause a lot of backtracking.
>
> daniel
>
>
____________________________________________________________
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie