You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by Steve Kearns <sk...@hotmail.com> on 2001/01/05 07:22:44 UTC

Re: New Regular Expression Package: JavaRegex2

It seems that your primary goal, compatability with
Perl, is incompatible with my primary goal, which was
a maximally usefull regular expression package.
For me, compatability with Perl is a nice secondary
goal, but it is secondary to good functionality for
Java programmers.  After all, we are supporting
Java programmers and not Perl programmers.

For instance, pattern definitions are an obvious
incompatability with Perl and with the existing ORO
stuff.

The key benefit, as you point out, of JavaRegex2 is
the compiled pattern DFA stuff.  Given that,
a designer has a fundamental choice to make:
If you want interpreted and compiled modes to be
switchable with only a flag throw, then you have to
disallow the backreferences feature which cannot
be implemented in compiled mode.  The only
other choice is to allow this feature to be used with
interpreted patterns but not compiled, which would
be a bit more complicated for users of the package.

In my experience, backreferences are sometimes usefull
but only in 2% of the patterns I have ever had to write.
For those 2% they are REALLY usefull, though, so
perhaps a full featured regex package should support
backreferences in interpreted mode.

As soon as you have compiled patterns, you need
external files and precompiled patterns;  these
would of course require extensions to ORO interfaces.
As soon as
you have named matches instead of numeric indices
this requires important changes to ORO interfaces.
But I detect substantial resistance to such changes.

Given this, it seems that JavaRegex2 is best
deployed as a separate project from ORO.

>From: "Daniel F. Savarese" <df...@savarese.org>
>Reply-To: oro-dev@jakarta.apache.org
>To: oro-dev@jakarta.apache.org
>Subject: Re: New Regular Expression Package: JavaRegex2
>Date: Tue, 02 Jan 2001 01:32:29 -0500
>
>
> >JavaRegex2 offers significant improvements over ORO, which we
> >now describe.
>
>The only purpose of the org.apache.oro.text.regex package is to
>provide 100% compatible Perl regular expressions in Java.  It
>is a bit out of date, complying with Perl 5.003.  If you've
>got an up to date Perl 5.6 compatible implementation, by all
>means, let's see about integrating it.  However, other than
>the known unicode character class deficiency (for which there is a
>non-optimal, but working patch; and fixing it the "right" way is
>just a matter of someone with more time than myself to take the time
>to do it), I don't see that you listed anything that can't be done with
>jakarta-oro.  Precompiled patterns can be serialized and written to files,
>if serialization isn't desired it's simple for an application to read
>expressions from a file and compile them (I don't think this belongs in
>the library, but others may disagree), patterns can be precompiled and
>cached, named and labeled submatching can be implemented on top
>of the library but don't fit in with the Perl-compatibility goal.
>
> >JavaRegex2 offers maximal compatability with the Perl5 regular
> >expression language, given the few new syntax additions to
> >support all of the new features.
>
>If it's Perl 5.6 compatible, this is very much in line with where
>we need to go.
>
> >All Perl5 regular expression language features have been implemented,
> >except for the few features which are not
> >possible to implement with a compiled regular expression:
> >
> >* backreferences (\1, \2) cannot be implemented.
>
>This is a big problem and why the ORO package was implemented along the
>lines of Henry Spencer's package and Perl, using an at times less
>efficient NFA implementation.  Backreferences are very important to
>users of the library.
>
> >Also, all the new features do not shoehorn into the ORO
> >interfaces, so unfortunately users would have to
> >learn to use a slightly different interface.
>
>If we were to integrate your package, the ORO interfaces could be
>extended, but we can't break people's existing code other than
>through the fact that Perl 5.6 expressions behave differently than
>Perl 5.003 expressions.
>
> >expressions".  I can mostly likely get permission
> >to open source it but only want to go through all of the
> >necessary paperwork if it is likely to be
> >adopted.  Please provide appropriate feedback.
>
>There's no way to say without seeing your code.  Thousands of developers
>use the ORO library in either the current jakarta-oro form or the original
>OROMatcher incarnation.  We can't bet their existing investment solely
>on what you've described.  The only two key things you offer, and I
>emphasize that these are important things, are more current (v5.6)
>Perl regex support and potentially faster DFA-based pattern matching.
>The most obvious problem is the lack of backreference support.  The only
>way to know how we can proceed is by seeing your code.  What have you
>got to lose by going through the paperwork other than some time?  Can
>you at least make the binary library and API documentation available
>for evaluation?
>
>daniel
>
>

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com


Re: New Regular Expression Package: JavaRegex2

Posted by Jon Stevens <jo...@latchkey.com>.
on 1/4/2001 10:22 PM, "Steve Kearns" <sk...@hotmail.com> wrote:

> Given this, it seems that JavaRegex2 is best
> deployed as a separate project from ORO.

I agree. sourceforge.net is a great place to host your project.

good luck and thanks.

-jon