You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2001/07/12 02:22:33 UTC

Re: "\d{0,4}" doesn't work?

In message <3B...@localhost>, "Michael McCallum" writes:
>I like Daniels idea of the posix compliance.

I thought the original goal of regexp was to implement POSIX expressions,
but I may be wrong.

>What do you see as the main tasks necessary to fold regexp in as the posix pac
>kage?

The following four classes would need to be implemented:

public class PosixCompiler implements PatternCompiler {
...
}

public class PosixMatcher implements PatternMatcher {
...
}

public class PosixPattern implements Pattern {
...
}

public class PosixMatchResult implements MatchResult {
...
}

and I suppose also the trivial to implement
org.apache.text.oro.io.PosixFilenameFilter.

Implementing these classes isn't so hard and just requires recasting the
existing code into a different mold.  After they are implemented, you get
split, substitute, pattern caching, and a few other things for free from
the generic code in some of the other packages.  The question of
backward compatibility can then be addressed and a decision made as to
whether or not to implement some facade classes that implement the old
API for backward compatibility.

I'd favor forgetting about backward compatibility and starting a new
discussion on creating a generic programmer-friendly API in the spirit
of the RE class and taking into account developments in JDK 1.4.  I've
observed that users of Java regex packages falling into two camps.  One places
priority on ease of use (usually measured as fewer lines of code).  They
tend to use Jakarta regexp, GNU regexp, or jakarta-oro Perl5Util.  The
other places priority on flexibility and performance at the expense of
having to write extra code.  They tend to use (and sometimes misuse :) the
jakarta-oro core classes.  There is no reason a generic (meaning you can
choose Awk, Glob, Perl5, POSIX, or any other syntax implemented with the
org.apache.oro.text.regex interfaces) package providing both ease of use
and some flexibility and efficiency.  I could see a class that looks something
like RE, but allows the choice of syntax and internally shares a class global
pattern cache for compiling patterns, but uses a separate PatternMatcher per
instance.  Unfortunately, the JDK 1.4 stuff is a monolithic implementation and
not a set of interfaces, so I'm not sure there's much we can do to be 
"compatible"
with what will become the officially sanctioned standard.

>May I suggest a plan ( and yes Jon that means im volunteering to do it or at l
>east some of it ;-)
>
>1) Fix all the current bugs.
>2) Make a 1.3 release.
>3) Redesign into oro.posix

Sounds like a good plan.  I'll take a gander at the bug list and see if there
aren't one or two I can make the time to help out on.  My impression is the
list is rather much for one person to tackle.  Although there's a chance
some of the bugs may be non-bugs if we define the behavior of the package to
be POSIX rather than Perlish (or is it Perlesque).

>( Any one have a Posix.2 Standard :). Could just cheat and use rx or something
>)

I used to have one available at one of my old jobs and I've been trying to
get the Posix standards ever since.  But last I checked they are out of print
from the IEEE and new ones won't be available until the latest standards
work (merging of Posix, Open Unix Specification, etc.) is completed.  However,
I think you can  use the syntax definition from the Open Unix Specification:
    http://www.opengroup.org/onlinepubs/007908799/xbd/re.html

daniel