You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by Ed Chidester <ec...@textwise.com> on 2001/05/11 21:49:18 UTC

Comparing oro and regexp

Daniel,

I was not implying that the ORO classes are larger than they should be. I was 
merely comparing the physical sizes of the regexp and oro jars. Although I 
probably won't personally make use of separate "slice" jars, it makes a lot 
of sense to offer that option for developers who only need one or two subsets 
of the functionality oro offers (especially when more general text processing 
is added).

Now, to supply the sample benchmark code that backs up my 
"performance anecdotes"...

I compared the ORO Perl5Pattern/Perl5Matcher classes with the Regexp RE 
class (although I also compared the Perl5Util and the RE classes with similar 
results). I'm attaching a (slightly modified) copy of the code on which I 
based my previous speed assertions. The modifications, aside from adding 
documentation, amount to removing many of the patterns that I tested. This 
was meant as a highly targeted test of regular expression match times 
(making no comparison of substitution times) and it's possible that a more 
varied selection of patterns will yield different results. I ran this code 
using Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode) on an 800Mhz 
PIII w/ 256Meg RAM.


I believe that I am using both the Regexp and ORO packages in an efficient
manner. But, please let me know if this is not the case.


Thanks,

Ed.


Disclaimer: The attached code was written as a quick test and does not 
  necessarily reflect my best judgment in coding practices. Also, I used 
  synchronized blocks in an effort to ensure that Java's dynamic class loading 
  didn't cause erroneous timing results. If this is completely off-base, please 
  let me know of a better approach.


On Thu, 10 May 2001 "Daniel F. Savarese" <df...@savarese.org> wrote:
> 
> 
> >Basically, the regexp package is smaller and has a reduced feature set.
> >In fact, the regexp package jar file is less than half the size of the oro
> >package jar.
> 
> The feeling that jakarta-oro is large is a common misconception.  The size
> of what used to be OROMatcher is very small.  All you need for
> regular expressions is the org.apache.oro.text.regex package, not all
> of the other stuff.  To alleviate this misconception, we're going to
> provide a jakarta-oro jar that has everything and then separate jars for
> strictly those slices that people want, roughly corresponding to the old
> OROMatcher, PerlTools, AwkTools, and TextTools packages.
> 
> >Initially, regexp handles matching (and rejecting matches) more quickly. But,
> >after a few hundred matches, the time required by the regexp package
> >(especially in rejecting matches) increases considerably when compared to
> >the oro package.
> 
> This is also another misconception, although not directly in relation to
> the regexp package.  The jakarta-oro package has 4 different regular
> expression packages.  So when you compare performance, you have to
> specify which one.  Also, a lot of times people talk about jakarta-oro
> when they really mean the Perl5Util class, which is a convenience
> wrapper around the org.apache.oro.text.regex package.  Perl5Util will
> always be slow (although we can improve its performance) because it
> does a higher level set of parsing so that you can use Perl-specific
> syntactic sugar like 's/foobar/barfoo/g' instead of the allegedly
> more cumbersome approach of directly using the org.apache.oro.text.regex
> classes.  Furthermore, most people blatantly misuse the
> org.apache.oro.text.regex package by constantly reinstantiating and
> Perl5Compiler and Perl5Matcher instances and constantly recompiling
> regular expressions.  Hopefully this will stop after we write a new
> user's guide explaining how to make proper use of the package.
> A valid performance comparison can only be made by posting the code used
> to make the comparison.  I don't know how you reached the assessment you
> made.  All performance evaluation code is welcome on oro-dev because
> even though the primary goal for at least the Perl related stuff is to
> achieve compatibility with Perl, the secondary goal is to be as fast
> as possible within the constraints of Perl's regex syntax and Java's
> runtime performance.
> 
> daniel
> 
>