You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by Ed Chidester <ec...@textwise.com> on 2001/05/11 21:49:18 UTC
Comparing oro and regexp
Daniel,
I was not implying that the ORO classes are larger than they should be. I was
merely comparing the physical sizes of the regexp and oro jars. Although I
probably won't personally make use of separate "slice" jars, it makes a lot
of sense to offer that option for developers who only need one or two subsets
of the functionality oro offers (especially when more general text processing
is added).
Now, to supply the sample benchmark code that backs up my
"performance anecdotes"...
I compared the ORO Perl5Pattern/Perl5Matcher classes with the Regexp RE
class (although I also compared the Perl5Util and the RE classes with similar
results). I'm attaching a (slightly modified) copy of the code on which I
based my previous speed assertions. The modifications, aside from adding
documentation, amount to removing many of the patterns that I tested. This
was meant as a highly targeted test of regular expression match times
(making no comparison of substitution times) and it's possible that a more
varied selection of patterns will yield different results. I ran this code
using Java HotSpot(TM) Client VM (build 1.3.0-C, mixed mode) on an 800Mhz
PIII w/ 256Meg RAM.
I believe that I am using both the Regexp and ORO packages in an efficient
manner. But, please let me know if this is not the case.
Thanks,
Ed.
Disclaimer: The attached code was written as a quick test and does not
necessarily reflect my best judgment in coding practices. Also, I used
synchronized blocks in an effort to ensure that Java's dynamic class loading
didn't cause erroneous timing results. If this is completely off-base, please
let me know of a better approach.
On Thu, 10 May 2001 "Daniel F. Savarese" <df...@savarese.org> wrote:
>
>
> >Basically, the regexp package is smaller and has a reduced feature set.
> >In fact, the regexp package jar file is less than half the size of the oro
> >package jar.
>
> The feeling that jakarta-oro is large is a common misconception. The size
> of what used to be OROMatcher is very small. All you need for
> regular expressions is the org.apache.oro.text.regex package, not all
> of the other stuff. To alleviate this misconception, we're going to
> provide a jakarta-oro jar that has everything and then separate jars for
> strictly those slices that people want, roughly corresponding to the old
> OROMatcher, PerlTools, AwkTools, and TextTools packages.
>
> >Initially, regexp handles matching (and rejecting matches) more quickly. But,
> >after a few hundred matches, the time required by the regexp package
> >(especially in rejecting matches) increases considerably when compared to
> >the oro package.
>
> This is also another misconception, although not directly in relation to
> the regexp package. The jakarta-oro package has 4 different regular
> expression packages. So when you compare performance, you have to
> specify which one. Also, a lot of times people talk about jakarta-oro
> when they really mean the Perl5Util class, which is a convenience
> wrapper around the org.apache.oro.text.regex package. Perl5Util will
> always be slow (although we can improve its performance) because it
> does a higher level set of parsing so that you can use Perl-specific
> syntactic sugar like 's/foobar/barfoo/g' instead of the allegedly
> more cumbersome approach of directly using the org.apache.oro.text.regex
> classes. Furthermore, most people blatantly misuse the
> org.apache.oro.text.regex package by constantly reinstantiating and
> Perl5Compiler and Perl5Matcher instances and constantly recompiling
> regular expressions. Hopefully this will stop after we write a new
> user's guide explaining how to make proper use of the package.
> A valid performance comparison can only be made by posting the code used
> to make the comparison. I don't know how you reached the assessment you
> made. All performance evaluation code is welcome on oro-dev because
> even though the primary goal for at least the Perl related stuff is to
> achieve compatibility with Perl, the secondary goal is to be as fast
> as possible within the constraints of Perl's regex syntax and Java's
> runtime performance.
>
> daniel
>
>