You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Millington Roger <ro...@barclaycard.co.uk> on 2001/06/18 17:27:50 UTC

RE: Thread safety

Point taken Daniel - I didn't know about the READ_ONLY_MASK !

>From my point of view, objects holding regular expressions are immutable
functional objects in that they should provide only action methods with each
action method being self contained. For example, operations on a regular
expression such as match(), split(), substitute() etc are action methods
with no state being stored within the regular expression. Under these
conditions there is no need to use 'synchronized'.

I was not trying to criticize the 'jakarta-oro' library - I like it - I was
just pointing out that I can't prove that it is thread safe. I do criticize
'jakarta-regexp-1.2' - it's regular expression objects are not thread safe.
Further more, the only way to make jakarta-regexp-1.2 regular expression
objects thread safe is to place synchronized { ... } around the statements
required to be atomic.

I have written a set of functional interfaces to provide all the standard
operations that I use. I have written implementations for 6 different RE
libraries and am using them to test the performance of each of the
libraries. The only ones I have to use synchronized with are
'jakarta-regexp-1.2'  and Rex.

I have found problems with the semantics of 'split()' in regex4j. As noted
previously I have found two problems with  'jakarta-regexp-1.2'.

As far as speed is concerned, for my limited set of tests 'regex4j' looks to
be fastest with 'jakarta-oro' just slightly behind and 'jakarta-regexp-1.2'
just behind 'jakarta-oro'. GNU is very slow - factors of 10. Rex varies
dramatically. 'jakarta-regexp-1.2' seems to be slow when there is
significant backtracking. 

If I ever get the time I will formalize this!

Regards
  Roger



> -----Original Message-----
> From:	Daniel F. Savarese [SMTP:dfs@savarese.org]
> Sent:	18 June 2001 15:43
> To:	Millington Roger
> Subject:	Re: Thread safety 
> 
> 
> >safe. 'jakarta-oro-2.0.x' may be thread safe but I need to be sure that
> the
> >pattern matcher classes hold all the match information. 'regex4j' looks
> to
> 
> You have to be careful when using the words thread safe.
> Different libraries have different philosophical approaches to
> dealing with threads.  Some synchronize everything themselves, others
> put the onus on the programmer.  A thread safe library is only thread
> safe if used properly.  Furthermore, jakarta-oro has a lot of classes
> in several different packages, so you can't talk about "jakarta-oro"
> being thread safe; you need to referenced specific classes or packages.
> 
> The original code that led to jakarta-oro was written in the JDK 1.0.2
> days, when synchronization was an unacceptably high overhead.  The design
> of the packages, specifically .regex is for you to use separate
> Perl5Compiler and Perl5Matcher objects per thread.  These are
> lightweight objects and doing so relieves the need to use synchronization.
> If you want to share a Perl5Pattern between threads, you must compile it
> with READ_ONLY_MASK (see javadocs for Perl5Compiler.READ_ONLY_MASK for an
> explanation).  The .awk package has no such restriction; AwkPattern can
> be freely shared between AwkMatcher instances.  The Perl5Util convenience
> class is synchronized, so if using that, there's nothing special to do,
> but you sacrifice speed.
> 
> I am only writing this because we haven't written a user's guide for
> jakarta-oro yet and if you don't read the javadocs carefully, you
> won't get it.  It was explained in the old OROMatcher 1.0 user's guide.
> At any rate, use and read the documentation for
> Perl5Compiler.READ_ONLY_MASK before drawing any conclusions.
> 
> daniel
> 


Legal Disclaimer:-

Please be aware that messages sent over
the Internet may not be secure and should
not be seen as forming a legally binding
contract unless otherwise stated.