You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Tomasz Pik <pi...@ais.pl> on 2002/09/26 15:04:43 UTC

next possible commons project

-------- Original Message --------
Date: 26 Sep 2002 12:58:16 -0000
Message-ID: <20...@nagoya.betaversion.org>
From: bugzilla@apache.org
To: commons-dev@jakarta.apache.org
Cc:
Subject: DO NOT REPLY [Bug 13031] New: - Use regular expression (regex) 
pattern matching for parsing
Status:


 > Most of the parsing code in HttpClient is custom, and while fast, is
 > potentially error prone.  The traslation of the rfc BNF into regex
 > would be a large maintainability improvement.
 >
 > Java 1.4 has a new regex package, and there are others that could be
 > considered if a reliance on 1.4 is to be avoided.

Maybe something like 'Commons Logging' - one hat (with limited
funcionality) for most of the regexp packages
(http://regex.info/java.html)?

Regards
Tomek Pik


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Daniel Rall <dl...@finemaltcoding.com>.
Steve Downey <st...@netfolio.com> writes:

> On Thursday 26 September 2002 12:27 pm, Berin Loritsch wrote:
> > Stephen Colebourne wrote:
> > > In the same way as [logging], by not being a regexp package itself.
> > >
> > > Of course it may just not be appropriate...
> >
> > To be honest, I don't like the "autodiscovery" mechanisms in Commons
> > logging.  I would be hard pressed to support another something like
> > that for something less likely to be in widespread use.  It is possible
> > to just use the project that you need and stick with it for RegEx.
> >
> > There are very few projects out there that are meant to be used as a
> > library that require a regex package (that I am aware of, but I don't
> > get out much anymore).  The chances of using two projects that require
> > different RegEx solutions are so minute, that a commons version doesn't
> > seem necessary.
> >
> > That's just my 2 cents.
> 
> The odds of having two projects that require regexp packages that can also 
> tolerate having the definition of regular expression changed underneath them 
> approaches zero.

I agree whole heartedly with that statement.  Jakarta Regexp does not
handle nearly the same range of regular expressions which ORO handles
(its use case is different).  Trying to treat the two engines as the
same is insanity.
-- 

Daniel Rall <dl...@finemaltcoding.com>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Steve Downey <st...@netfolio.com>.
On Thursday 26 September 2002 02:35 pm, Daniel F. Savarese wrote:
> Steve Downey wrote:
> >The odds of having two projects that require regexp packages that can also
> >tolerate having the definition of regular expression changed underneath
> > them approaches zero.
>
> I agree with this as far as most applications are concerned.  I don't
> know the original motivation for this thread, but I can offer the reasons
> why it's thought to be useful in jakarta-oro.  
The original motivation was a proposal to have a commons-regexp package like 
commons-logging that would abstract out the differences between different 
regexp packages. IOW, to be able to plug either jakarta-regexp or jakarta-oro 
in as implementation where a regexp is required. 

Actually the original motivation was the suggestion that HttpClient's parser 
use regular expressions to match parts of the HTTP spec, rather than write 
the parser by hand. The spec provides BNF, and converting that to regular 
expressions is fairly straightforward. 

Since the regular expressions would have to be defined and understood by 
HttpClient, changing the implementation of the regular expression parser 
would probably cause bugs. Either the regexp would be illegal, or return 
different results. 

Also, one of the motivations for commons-logging was that logging needs to 
interoperate. Two logging packages for an application is a disaster. Two 
regexp packages for an application is an inconvienence. 



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by "Daniel F. Savarese" <df...@savarese.org>.
Steve Downey wrote:
>The odds of having two projects that require regexp packages that can also 
>tolerate having the definition of regular expression changed underneath them 
>approaches zero.

I agree with this as far as most applications are concerned.  I don't 
know the original motivation for this thread, but I can offer the reasons
why it's thought to be useful in jakarta-oro.  Having a generic API for
regular expressions allows you to write text processing classes,
for example tokenizers, whose use depends on user-defined regular
expressions.  The developer using the library can then choose a
regular expression grammar that meets his particular needs or fancy.
I view it more as a library-building convenience.  But there are some
instances where an application would make direct use of the facility.
For example, a text search tool that lets you choose a regular expression
grammar that you are familiar with.  You write one search algorithm using
a single set of interfaces, but the instances of those interfaces are
user-determined and decided at run time.  For the most part, however,
applications that depend on statically predefined regular expression
have no use for such a facility.  In addition, the facility as conceived
in jakarta-oro is not quite analogous to commons-logging because multiple
regular expression engines can coexist in the same application and are
not automatically chosen for you, while it's my understanding that
commons-logging dynamically chooses one underlying logging library (which
makes sense for logging).  In any case, if anyone who has an immediate
need for dynamically pluggable regular expression engines is welcome
to make the addition to jakarta-oro instead of waiting for us to get
around to it.

daniel



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Steve Downey <st...@netfolio.com>.
On Thursday 26 September 2002 12:27 pm, Berin Loritsch wrote:
> Stephen Colebourne wrote:
> > In the same way as [logging], by not being a regexp package itself.
> >
> > Of course it may just not be appropriate...
>
> To be honest, I don't like the "autodiscovery" mechanisms in Commons
> logging.  I would be hard pressed to support another something like
> that for something less likely to be in widespread use.  It is possible
> to just use the project that you need and stick with it for RegEx.
>
> There are very few projects out there that are meant to be used as a
> library that require a regex package (that I am aware of, but I don't
> get out much anymore).  The chances of using two projects that require
> different RegEx solutions are so minute, that a commons version doesn't
> seem necessary.
>
> That's just my 2 cents.

The odds of having two projects that require regexp packages that can also 
tolerate having the definition of regular expression changed underneath them 
approaches zero.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Berin Loritsch <bl...@apache.org>.
Stephen Colebourne wrote:
> In the same way as [logging], by not being a regexp package itself.
> 
> Of course it may just not be appropriate...

To be honest, I don't like the "autodiscovery" mechanisms in Commons
logging.  I would be hard pressed to support another something like
that for something less likely to be in widespread use.  It is possible
to just use the project that you need and stick with it for RegEx.

There are very few projects out there that are meant to be used as a
library that require a regex package (that I am aware of, but I don't
get out much anymore).  The chances of using two projects that require
different RegEx solutions are so minute, that a commons version doesn't
seem necessary.

That's just my 2 cents.


-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Stephen Colebourne <sc...@btopenworld.com>.
> I just want to point out that jakarta-oro is more than just a regular
> expression package and already contains the generic interfaces to wrap
other
> packages (as it is, oro already implements 3 different regular expression
> grammars).  It's a simple matter to add a factory to generate generic
> matchers for arbitrary regular expression packages.  The project has
> plans to do this pending the completion of some other work, but anyone
> with an immediate need is welcome to put forth a design and start
> implementing.

Sounds like no work needed in commons then ;-)

Stephen


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by "Daniel F. Savarese" <df...@savarese.org>.
Stephen Colebourne writes:
>In the same way as [logging], by not being a regexp package itself.
>
>Of course it may just not be appropriate...

I just want to point out that jakarta-oro is more than just a regular
expression package and already contains the generic interfaces to wrap other
packages (as it is, oro already implements 3 different regular expression
grammars).  It's a simple matter to add a factory to generate generic
matchers for arbitrary regular expression packages.  The project has
plans to do this pending the completion of some other work, but anyone
with an immediate need is welcome to put forth a design and start
implementing.

daniel



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Stephen Colebourne <sc...@btopenworld.com>.
In the same way as [logging], by not being a regexp package itself.

Of course it may just not be appropriate...

Stephen

----- Original Message -----
From: "Berin Loritsch" <bl...@apache.org>
To: "Jakarta Commons Developers List" <co...@jakarta.apache.org>
Sent: Thursday, September 26, 2002 4:40 PM
Subject: Re: next possible commons project - [regexp]


> Daniel Rall wrote:
> > Jeff Dever <js...@sympatico.ca> writes:
> >
> >
> >>Jakarta does have a top level regexp project:
> >>http://jakarta.apache.org/regexp/
> >
> >
> > Jakarta Regexp is the more simple of the two Jakarta regex packages (I
> > believe that Tomcat uses this one).  ORO is much more full featured,
> > offering the full power of Perl 5 regexes.
> >
> > How would yet another regex package seek to differentiate itself from
> > the existing two?
>
> AND JDK 1.4 Regex?
>
> --
>
> "They that give up essential liberty to obtain a little temporary safety
>   deserve neither liberty nor safety."
>                  - Benjamin Franklin
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Berin Loritsch <bl...@apache.org>.
Daniel Rall wrote:
> Jeff Dever <js...@sympatico.ca> writes:
> 
> 
>>Jakarta does have a top level regexp project:
>>http://jakarta.apache.org/regexp/
> 
> 
> Jakarta Regexp is the more simple of the two Jakarta regex packages (I
> believe that Tomcat uses this one).  ORO is much more full featured,
> offering the full power of Perl 5 regexes.
> 
> How would yet another regex package seek to differentiate itself from
> the existing two?

AND JDK 1.4 Regex?

-- 

"They that give up essential liberty to obtain a little temporary safety
  deserve neither liberty nor safety."
                 - Benjamin Franklin


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Daniel Rall <dl...@finemaltcoding.com>.
Jeff Dever <js...@sympatico.ca> writes:

> Jakarta does have a top level regexp project:
> http://jakarta.apache.org/regexp/

Jakarta Regexp is the more simple of the two Jakarta regex packages (I
believe that Tomcat uses this one).  ORO is much more full featured,
offering the full power of Perl 5 regexes.

How would yet another regex package seek to differentiate itself from
the existing two?
-- 

Daniel Rall <dl...@finemaltcoding.com>

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Jeff Dever <js...@sympatico.ca>.
Jakarta does have a top level regexp project:
http://jakarta.apache.org/regexp/

I have never used it, but it would certainly be a candidate.  Is it still alive?



Stephen Colebourne wrote:

> I've had this idea too, but as I don't have a burning need I haven't done
> anything about it.
>
> BTW: The [lang] CharSetUtils class as a *possible* very very very basic
> regexp class that could be slotted in as a default implementation.
>
> Stephen
>
> From: "Tomasz Pik" <pi...@ais.pl>
> > -------- Original Message --------
> >  > Java 1.4 has a new regex package, and there are others that could be
> >  > considered if a reliance on 1.4 is to be avoided.
> >
> > Maybe something like 'Commons Logging' - one hat (with limited
> > funcionality) for most of the regexp packages
> > (http://regex.info/java.html)?
> >
> > Regards
> > Tomek Pik
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project - [regexp]

Posted by Stephen Colebourne <sc...@btopenworld.com>.
I've had this idea too, but as I don't have a burning need I haven't done
anything about it.

BTW: The [lang] CharSetUtils class as a *possible* very very very basic
regexp class that could be slotted in as a default implementation.

Stephen

From: "Tomasz Pik" <pi...@ais.pl>
> -------- Original Message --------
>  > Java 1.4 has a new regex package, and there are others that could be
>  > considered if a reliance on 1.4 is to be avoided.
>
> Maybe something like 'Commons Logging' - one hat (with limited
> funcionality) for most of the regexp packages
> (http://regex.info/java.html)?
>
> Regards
> Tomek Pik
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: next possible commons project

Posted by Steve Downey <st...@netfolio.com>.
I don't think it makes sense. From a programmatic point of view, logging is 
side-effect free. Differences between logging packages are completely 
ignoreable. Even if logs are /dev/null. 

That is not the case for regular expressions. 

Different regular expression packages treat expressions differently. And 
converting between them is NOT possible. 

Differences would be very visible to the calling program. A match would return 
true with one package, false with another, and throw an exception with a 
third.

There's a reason you have grep and egrep, and not just a single program. Not 
to mention perl, emacs, awk, ...

The only sensible thing to do is to pick one and write the regular expressions 
in terms of that package, and test against that package. Now, jakarta does 
have two. I've always used ORO. I don't remember why I didn't use regexp. 

The fact that oro has had activity in the last 6 months is an important point, 
I think. The last two commits in regexp?
2001-08-24 17:58  jon

        * build/build-regexp.xml (1.10): do just enough work to allow an
        install-jar target

        this stuff needs to be upgraded

2001-04-12 18:03  gholam

        * build/build-regexp.xml (1.9): Added a few property imports users
        can override some settings.  Is after the setting they should not
        be able to change eg project etc.

        Removed the build.compiler property from the build script.  I
        prefer to have a global setting for this in
        ~/.ant/compiler.properties which can be overriden in this case by a
        value in ~/.ant/jakarta-regexp.properties.





On Thursday 26 September 2002 09:04 am, Tomasz Pik wrote:
> -------- Original Message --------
> Date: 26 Sep 2002 12:58:16 -0000
> Message-ID: <20...@nagoya.betaversion.org>
> From: bugzilla@apache.org
> To: commons-dev@jakarta.apache.org
> Cc:
> Subject: DO NOT REPLY [Bug 13031] New: - Use regular expression (regex)
> pattern matching for parsing
>
> Status:
>  > Most of the parsing code in HttpClient is custom, and while fast, is
>  > potentially error prone.  The traslation of the rfc BNF into regex
>  > would be a large maintainability improvement.
>  >
>  > Java 1.4 has a new regex package, and there are others that could be
>  > considered if a reliance on 1.4 is to be avoided.
>
> Maybe something like 'Commons Logging' - one hat (with limited
> funcionality) for most of the regexp packages
> (http://regex.info/java.html)?
>
> Regards
> Tomek Pik


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>