You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2006/03/30 08:13:40 UTC

Re: Perl5Util performance

In message <82...@mail.gmail.com>, "Duke 
Tantiprasut" writes:
>I'm curious why there is such a significant jump from the Perl5Matcher
>compared to the java.util.regex?

A hefty chunk of that time comes from converting strings to char[] before
matching.  I've tuned that benchmark before and trimmed 25% of the time
just by using PatternMatcherInput instead of String.  It's not exactly
a rigorous benchmark anyway.  Measurements I've made in the past show
that the performance of the packages depends heavily on the input and
how the regular expressions are written.  Two equivalent regular
expressions can have very different performance characteristics.
That said, ORO is behind the times on performance, having been designed
originally to get the most out of JDK 1.0.2.

A question that bears revisiting is if Perl5Matcher needs to bother
converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
performance win, but unless you're working with your input as
char[] from the start, I bet these days it would be faster to not make
the conversion and work directly with String (or CharSequence) if we're
willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
a java.util.regex, the primary reason to use ORO appears to be if you're
still on 1.2/1.3...

In response to the email Subject, Perl5Util is a convenience class and
will always be slower than using Perl5Matcher directly because Perl5Util
has to parse the native Perl-style representation of expressions :(

daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org


Re: Perl5Util performance

Posted by Duke Tantiprasut <du...@gmail.com>.
Any news on when the new release will be available with the changes?

Thanks.

Duke

On 4/1/06, Daniel F. Savarese <df...@savarese.org> wrote:
>
>
> In message <82...@mail.gmail.com>,
> "Duke
> Tantiprasut" writes:
> >I think I'm going to stick it out with oro/perl5util. I prefer to provide
> >the flexibility and perl5 familarity than a little extra speed at this
> >stage. Do you know when you'll get the chance to look at the changes to
> mak=
> >e
> >it more multi-thread friendly?
>
> I made the change on the trunk this morning.  You'll have to check it
> out with svn and compile it.  I don't know when we'll be cutting a new
> release.  Everything related to ORO is done based on user demand.  The
> change could always be backported to produce a 2.0.9 release because
> the trunk has changes in it that aren't appropriate for 2.0.9 and
> may still change (the engine wrapper interfaces and the implementation of
> a wrapper for java.util.regex).  However, the trunk is stable (i.e., no
> more bugs than 2.0.8), so it's safe to use as you would 2.0.8 even though
> the new stuff may change.  Just read the CHANGES file for a list of
> additions.
>
> >With Perl5Util, doesnt that generate the patterns that cached and used
> the
> >Perl5Matcher? i.e. am I correct in assuming that the penalty is only
> during
> >the initial pattern generation and not during subsequent matching?
>
> Yes, that is correct.  The patterns are generated only the first time
> they are used (or if they subsequently get kicked out of the cache).
> I don't know how bad of a performance hit the synchronized method calls
> are these days, but it would have helped in 1.0.2, 1.1, and probably 1.2
> to have avoided synchronizing the methods.  But Perl5Util was a
> user-requested class (AOL actually asked for it) and the whole idea at
> the time was that if you wanted performance, you should use a separate
> matcher in separate threads.  In general, my preference is to push thread
> concerns out of libraries and into applications as much as possible, but
> given the nature of Perl5Util, it does seem kind of weird to me now that
> it uses synchronized methods everywhere and doesn't just use a separate
> matcher for each thread.  On the other hand, if it were to do that, then
> it would be better for Perl5Util to be unsynchronized, leaving it to the
> application to create thread-local Perl5Util instances.  But the request
> at the time was to be able to use a single class instance to perform
> matches in multiple threads.  Less RAM back in those days.
>
> daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
>
>

Re: Perl5Util performance

Posted by Duke Tantiprasut <du...@gmail.com>.
Thanks. I'll have a look monday/tuesday next week and let you know if I run
into any hiccups.

Duke

On 4/1/06, Daniel F. Savarese <df...@savarese.org> wrote:
>
>
> In message <82...@mail.gmail.com>,
> "Duke
> Tantiprasut" writes:
> >I think I'm going to stick it out with oro/perl5util. I prefer to provide
> >the flexibility and perl5 familarity than a little extra speed at this
> >stage. Do you know when you'll get the chance to look at the changes to
> mak=
> >e
> >it more multi-thread friendly?
>
> I made the change on the trunk this morning.  You'll have to check it
> out with svn and compile it.  I don't know when we'll be cutting a new
> release.  Everything related to ORO is done based on user demand.  The
> change could always be backported to produce a 2.0.9 release because
> the trunk has changes in it that aren't appropriate for 2.0.9 and
> may still change (the engine wrapper interfaces and the implementation of
> a wrapper for java.util.regex).  However, the trunk is stable (i.e., no
> more bugs than 2.0.8), so it's safe to use as you would 2.0.8 even though
> the new stuff may change.  Just read the CHANGES file for a list of
> additions.
>
> >With Perl5Util, doesnt that generate the patterns that cached and used
> the
> >Perl5Matcher? i.e. am I correct in assuming that the penalty is only
> during
> >the initial pattern generation and not during subsequent matching?
>
> Yes, that is correct.  The patterns are generated only the first time
> they are used (or if they subsequently get kicked out of the cache).
> I don't know how bad of a performance hit the synchronized method calls
> are these days, but it would have helped in 1.0.2, 1.1, and probably 1.2
> to have avoided synchronizing the methods.  But Perl5Util was a
> user-requested class (AOL actually asked for it) and the whole idea at
> the time was that if you wanted performance, you should use a separate
> matcher in separate threads.  In general, my preference is to push thread
> concerns out of libraries and into applications as much as possible, but
> given the nature of Perl5Util, it does seem kind of weird to me now that
> it uses synchronized methods everywhere and doesn't just use a separate
> matcher for each thread.  On the other hand, if it were to do that, then
> it would be better for Perl5Util to be unsynchronized, leaving it to the
> application to create thread-local Perl5Util instances.  But the request
> at the time was to be able to use a single class instance to perform
> matches in multiple threads.  Less RAM back in those days.
>
> daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
>
>

Re: Perl5Util performance

Posted by Duke Tantiprasut <du...@gmail.com>.
Hi Daniel,

I think I'm going to stick it out with oro/perl5util. I prefer to provide
the flexibility and perl5 familarity than a little extra speed at this
stage. Do you know when you'll get the chance to look at the changes to make
it more multi-thread friendly?

With Perl5Util, doesnt that generate the patterns that cached and used the
Perl5Matcher? i.e. am I correct in assuming that the penalty is only during
the initial pattern generation and not during subsequent matching?

Thanks

Duke

On 3/30/06, Duke Tantiprasut <du...@gmail.com> wrote:
>
> Thanks Daniel.
>
> Sounds like I should be moving to java.util.regex. I do like the
> convenience of the pattern caching but I guess it's easy enough to set that
> up myself for java.util.regex.
>
> Duke
>
>
> On 3/29/06, Daniel F. Savarese <df...@savarese.org> wrote:
> >
> >
> > In message <82...@mail.gmail.com>,
> > "Duke
> > Tantiprasut" writes:
> > >I'm curious why there is such a significant jump from the Perl5Matcher
> > >compared to the java.util.regex?
> >
> > A hefty chunk of that time comes from converting strings to char[]
> > before
> > matching.  I've tuned that benchmark before and trimmed 25% of the time
> > just by using PatternMatcherInput instead of String.  It's not exactly
> > a rigorous benchmark anyway.  Measurements I've made in the past show
> > that the performance of the packages depends heavily on the input and
> > how the regular expressions are written.  Two equivalent regular
> > expressions can have very different performance characteristics.
> > That said, ORO is behind the times on performance, having been designed
> > originally to get the most out of JDK 1.0.2.
> >
> > A question that bears revisiting is if Perl5Matcher needs to bother
> > converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
> > performance win, but unless you're working with your input as
> > char[] from the start, I bet these days it would be faster to not make
> > the conversion and work directly with String (or CharSequence) if we're
> > willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
> > a java.util.regex, the primary reason to use ORO appears to be if you're
> > still on 1.2/1.3...
> >
> > In response to the email Subject, Perl5Util is a convenience class and
> > will always be slower than using Perl5Matcher directly because Perl5Util
> > has to parse the native Perl-style representation of expressions :(
> >
> > daniel
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: oro-user-help@jakarta.apache.org
> >
> >
>

Re: Perl5Util performance

Posted by Duke Tantiprasut <du...@gmail.com>.
Thanks Daniel.

Sounds like I should be moving to java.util.regex. I do like the convenience
of the pattern caching but I guess it's easy enough to set that up myself
for java.util.regex.

Duke

On 3/29/06, Daniel F. Savarese <df...@savarese.org> wrote:
>
>
> In message <82...@mail.gmail.com>,
> "Duke
> Tantiprasut" writes:
> >I'm curious why there is such a significant jump from the Perl5Matcher
> >compared to the java.util.regex?
>
> A hefty chunk of that time comes from converting strings to char[] before
> matching.  I've tuned that benchmark before and trimmed 25% of the time
> just by using PatternMatcherInput instead of String.  It's not exactly
> a rigorous benchmark anyway.  Measurements I've made in the past show
> that the performance of the packages depends heavily on the input and
> how the regular expressions are written.  Two equivalent regular
> expressions can have very different performance characteristics.
> That said, ORO is behind the times on performance, having been designed
> originally to get the most out of JDK 1.0.2.
>
> A question that bears revisiting is if Perl5Matcher needs to bother
> converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
> performance win, but unless you're working with your input as
> char[] from the start, I bet these days it would be faster to not make
> the conversion and work directly with String (or CharSequence) if we're
> willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
> a java.util.regex, the primary reason to use ORO appears to be if you're
> still on 1.2/1.3...
>
> In response to the email Subject, Perl5Util is a convenience class and
> will always be slower than using Perl5Matcher directly because Perl5Util
> has to parse the native Perl-style representation of expressions :(
>
> daniel
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: oro-user-help@jakarta.apache.org
>
>