You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2001/05/16 10:40:04 UTC

Re: Comparing oro and regexp

>Now, to supply the sample benchmark code that backs up my 
>"performance anecdotes"...

Thanks!  Could you also include the regular expressions that you tested
and the input?  That will help us get started on both regression and
performance tests.

>I believe that I am using both the Regexp and ORO packages in an efficient
>manner. But, please let me know if this is not the case.

I haven't absorbed the entire piece of code yet, but something that's going
to skew your timings is your repeated new'ing of PatternMatcherInput
instances inside of your loops.  You should only create one instance
of PatternMatcherInput and use setInput() to change the input.  The
jakarta-oro package is designed so that you can minimize memory overhead
by reusing object instances.  At least you didn't repeatedly instantiate
Perl5Compiler and Perl5Matcher objects which is the most common mistake
people make and what I had in mind when talking about "performance anecdotes."
For example, I just received a code example from someone who is still
using OROMatcher (I asked them to upgrade to jakarta-oro) and was
reporting a possible bug.  The example was a method which on
every call created a Perl5Compiler instance, a Perl5Matcher instance, and
a Perl5Substitution instance.  Those allocations should have only been
performed once and the objects reused.

That said, I'm very curious to see if HotSpot's runtime inlining makes
it practical to work directly with String using charAt() rather than
converting to a char[] every time.  We can isolate the overhead of
toCharArray() by running your test with a String as input and then with
the equivalent char[] as input.  For all but the shortest pieces of
input, it used to be better to take the hit of using toCharArray()
because it was greatly outweighted by the performance gain of working
with arrays.  HotSpot may change that depending on how it optimizes
string element access.

At any rate, please continue to share your observations so we can use 
them to improve the software.

daniel