You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Benson Margulies <bi...@gmail.com> on 2015/01/31 16:58:35 UTC

Anyone interested in regular expressions, again?

So, once upon a time, there was a regex library here. It was retired,
presumably on the grounds that it was rendered obsolete by the JRE's
native support.

However, the JRE's regular expressions have a pretty severe problem;
they have unbounded (or at least, very, very, bad) execution time for
some combinations of data and regex.

To cope with this, we ported the Henry Spencer regular expression
library (as found in TCL) from C to Java.

Thus: https://github.com/basis-technology-corp/tcl-regex-java

Is anyone interested in this? Give or take the possible IP muddle of
the original C Code, I could grant it easily.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Ben McCann <be...@benmccann.com>.
I like the idea. I've run into this problem before. The library I've been
most familiar with is re2 <https://code.google.com/p/re2/> for C++, but a
popular Java version would be great.


On Sat, Jan 31, 2015 at 11:27 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 1/31/15 8:58 AM, Benson Margulies wrote:
> > So, once upon a time, there was a regex library here. It was retired,
> > presumably on the grounds that it was rendered obsolete by the JRE's
> > native support.
> >
> > However, the JRE's regular expressions have a pretty severe problem;
> > they have unbounded (or at least, very, very, bad) execution time for
> > some combinations of data and regex.
> >
> > To cope with this, we ported the Henry Spencer regular expression
> > library (as found in TCL) from C to Java.
> >
> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
> >
> > Is anyone interested in this? Give or take the possible IP muddle of
> > the original C Code, I could grant it easily.
>
> Anyone know the probability that JDK lameness gets addressed "soon?"
>
> If the answer is anything other than "in progress" I would see this
> as a good idea and Commons is a good place for it.  You can start in
> the sandbox any time you want and see if others wander by to join.
> Normal Incubator IP clearance rules apply; but other than that, its
> JFDI.
>
> Phil
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
about.me/benmccann

Re: Anyone interested in regular expressions, again?

Posted by Phil Steitz <ph...@gmail.com>.
On 1/31/15 8:58 AM, Benson Margulies wrote:
> So, once upon a time, there was a regex library here. It was retired,
> presumably on the grounds that it was rendered obsolete by the JRE's
> native support.
>
> However, the JRE's regular expressions have a pretty severe problem;
> they have unbounded (or at least, very, very, bad) execution time for
> some combinations of data and regex.
>
> To cope with this, we ported the Henry Spencer regular expression
> library (as found in TCL) from C to Java.
>
> Thus: https://github.com/basis-technology-corp/tcl-regex-java
>
> Is anyone interested in this? Give or take the possible IP muddle of
> the original C Code, I could grant it easily.

Anyone know the probability that JDK lameness gets addressed "soon?"

If the answer is anything other than "in progress" I would see this
as a good idea and Commons is a good place for it.  You can start in
the sandbox any time you want and see if others wander by to join. 
Normal Incubator IP clearance rules apply; but other than that, its
JFDI.

Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Ben McCann <be...@benmccann.com>.
That's be awesome James. I'd love to see re2j out in the open

On Mon, Feb 2, 2015 at 2:20 PM, James Ring <sj...@jdns.org> wrote:

> I spoke to one of the authors of re2j, a Google-internal port of the C++
> re2 library. The intention was to open source it but they just haven't got
> around to it.
>
> I may try and get Google to put re2j up on GitHub so you all can take a
> look. AFAIK it is heavily used in Google and it has an API that is largely
> compatible with java.util.regex. I know from personal experience that one
> can often benefit from re2j merely by replacing java.util.regex imports
> with the corresponding re2j imports.
>
> Regards,
> James
> On Feb 1, 2015 11:44 PM, "Thomas Neidhart" <th...@gmail.com>
> wrote:
>
> > On 02/02/2015 03:25 AM, sebb wrote:
> > > I would not wish to move away from Java RE *unless* the RE syntax was
> > > the same *and* the implementation was better performing *and* the
> > > existing code suffered from poor performance.
> > >
> > > It might be OK if the alternate implementation was missing some
> > > esoteric features, but I would be very wary of using any features that
> > > were not in the Java implementation.
> > >
> > > The likelihood is that the Java implementation will (eventually)
> > > become more performant, at which point it would be useful to be able
> > > to revert to the Java version.
> > > That requires a high degree of compatibilty to reduce the work
> involved.
> > >
> > > It might be more useful to produce a tool that detects inefficient RE
> > > usage and suggests improvements.
> >
> > I just know re2 a bit, but it is a trade-off:
> >
> >  * linear-time evaluation vs. some features (e.g. backreferences)
> >
> > A comparison between different regular expression implementations can be
> > found here:
> >
> > http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
> >
> > I am pretty sure the regexp implementation in java will not change,
> > simply because of backwards compatibility reasons, but such a library
> > would be useful as in many cases you do not need these additional
> > features but want to ensure that your regular expression will be
> > evaluated in linear time.
> >
> > Thomas
> >
> > >
> > >
> > > On 1 February 2015 at 22:35, James Carman <ja...@carmanconsulting.com>
> > wrote:
> > >> To be clear, I am not advocating this approach.  I was merely trying
> to
> > >> illustrate what a nightmare such an endeavor would be. :)
> > >>
> > >> On Sunday, February 1, 2015, James Carman <james@carmanconsulting.com
> >
> > >> wrote:
> > >>
> > >>> You would basically have to pick a canonical regex language if you
> > want a
> > >>> facade and be able to swap the regex library out.  Most of them are
> > very
> > >>> similar but they are not the same.
> > >>>
> > >>> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> > >>> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
> > >>>
> > >>>> I think we'll need some clear performance advantages documented as
> > well as
> > >>>> any compatibility issues.
> > >>>>
> > >>>> This begs for a facade API IMO. I would not want to recode my app
> > just to
> > >>>> test one vs. the other, it should be pluggable.
> > >>>>
> > >>>> Gary
> > >>>>
> > >>>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> > bimargulies@gmail.com
> > >>>>>
> > >>>> wrote:
> > >>>>
> > >>>>> So, once upon a time, there was a regex library here. It was
> retired,
> > >>>>> presumably on the grounds that it was rendered obsolete by the
> JRE's
> > >>>>> native support.
> > >>>>>
> > >>>>> However, the JRE's regular expressions have a pretty severe
> problem;
> > >>>>> they have unbounded (or at least, very, very, bad) execution time
> for
> > >>>>> some combinations of data and regex.
> > >>>>>
> > >>>>> To cope with this, we ported the Henry Spencer regular expression
> > >>>>> library (as found in TCL) from C to Java.
> > >>>>>
> > >>>>> Thus: https://github.com/basis-technology-corp/tcl-regex-java
> > >>>>>
> > >>>>> Is anyone interested in this? Give or take the possible IP muddle
> of
> > >>>>> the original C Code, I could grant it easily.
> > >>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> > >>>> Java Persistence with Hibernate, Second Edition
> > >>>> <http://www.manning.com/bauer3/>
> > >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > >>>> Spring Batch in Action <http://www.manning.com/templier/>
> > >>>> Blog: http://garygregory.wordpress.com
> > >>>> Home: http://garygregory.com/
> > >>>> Tweet! http://twitter.com/GaryGregory
> > >>>>
> > >>>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > > For additional commands, e-mail: dev-help@commons.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>



-- 
about.me/benmccann

Re: Anyone interested in regular expressions, again?

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br>.
Someone posted to Hacker News too. Here's the link to the comments there https://news.ycombinator.com/item?id=9070593

CheersBruno

 
      From: Benson Margulies <bi...@gmail.com>
 To: Commons Developers List <de...@commons.apache.org>; Bruno P. Kinoshita <br...@yahoo.com.br> 
 Sent: Tuesday, February 17, 2015 11:53 PM
 Subject: Re: Anyone interested in regular expressions, again?
   
Thanks! I look forward to retiring mine!

On Tue, Feb 17, 2015 at 8:48 PM, Bruno P. Kinoshita
<br...@yahoo.com.br> wrote:
> That's great news! Thanks James!
> Bruno
>
>      From: James Ring <sj...@jdns.org>
>  To: Commons Developers List <de...@commons.apache.org>
>  Sent: Tuesday, February 17, 2015 11:31 PM
>  Subject: Re: Anyone interested in regular expressions, again?
>
> Hey Benson,
>
> Just wanted to let you and the rest of the commons dev list that re2j
> is now in the open: please see https://github.com/google/re2j. Please
> take a look!
>
> Regards,
> James
>
> On Mon, Feb 9, 2015 at 12:16 PM, Benson Margulies <bi...@gmail.com> wrote:
>> On Mon, Feb 9, 2015 at 1:36 PM, James Ring <sj...@jdns.org> wrote:
>>> I'm working to bring re2j into the open, it will take some time
>>> because Google's internal procedures for this kind of thing are pretty
>>> lengthy. I'm hopeful it could be done in the next month or so.
>>
>> That is lovely news. Thanks!
>>
>>>
>>> On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
>>>> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
>>>> <th...@gmail.com> wrote:
>>>>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>>>>> The irony here is that the Java HSRE port happened because it seemed
>>>>>> easier than an RE2 port. Note the same statements about API's pretty
>>>>>> much apply.
>>>>>
>>>>> I am sorry, my response was not very sensible wrt your original proposal.
>>>>
>>>> It seems very sensible to me. A team at Google producing re2j is
>>>> likely to have produced a far superior comestible to what I did. If
>>>> there's any possibility that it will emerge in, oh, a month or two, I
>>>> don't think it makes sense to go to the trouble to pull the HSRE code
>>>> into Apache.
>>>>
>>>>>
>>>>> If we have another implementation that works fine and has a sufficiently
>>>>> large enough community then I do not see a problem to include it in the
>>>>> commons project, I would certainly be interested.
>>>>>
>>>>> Thomas
>>>>>
>>>>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>>>>> <th...@gmail.com> wrote:
>>>>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>>>>> around to it.
>>>>>>>>
>>>>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>>>>> with the corresponding re2j imports.
>>>>>>>
>>>>>>> that would be super-cool.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org


>
>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



   

Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
Thanks! I look forward to retiring mine!

On Tue, Feb 17, 2015 at 8:48 PM, Bruno P. Kinoshita
<br...@yahoo.com.br> wrote:
> That's great news! Thanks James!
> Bruno
>
>       From: James Ring <sj...@jdns.org>
>  To: Commons Developers List <de...@commons.apache.org>
>  Sent: Tuesday, February 17, 2015 11:31 PM
>  Subject: Re: Anyone interested in regular expressions, again?
>
> Hey Benson,
>
> Just wanted to let you and the rest of the commons dev list that re2j
> is now in the open: please see https://github.com/google/re2j. Please
> take a look!
>
> Regards,
> James
>
> On Mon, Feb 9, 2015 at 12:16 PM, Benson Margulies <bi...@gmail.com> wrote:
>> On Mon, Feb 9, 2015 at 1:36 PM, James Ring <sj...@jdns.org> wrote:
>>> I'm working to bring re2j into the open, it will take some time
>>> because Google's internal procedures for this kind of thing are pretty
>>> lengthy. I'm hopeful it could be done in the next month or so.
>>
>> That is lovely news. Thanks!
>>
>>>
>>> On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
>>>> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
>>>> <th...@gmail.com> wrote:
>>>>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>>>>> The irony here is that the Java HSRE port happened because it seemed
>>>>>> easier than an RE2 port. Note the same statements about API's pretty
>>>>>> much apply.
>>>>>
>>>>> I am sorry, my response was not very sensible wrt your original proposal.
>>>>
>>>> It seems very sensible to me. A team at Google producing re2j is
>>>> likely to have produced a far superior comestible to what I did. If
>>>> there's any possibility that it will emerge in, oh, a month or two, I
>>>> don't think it makes sense to go to the trouble to pull the HSRE code
>>>> into Apache.
>>>>
>>>>>
>>>>> If we have another implementation that works fine and has a sufficiently
>>>>> large enough community then I do not see a problem to include it in the
>>>>> commons project, I would certainly be interested.
>>>>>
>>>>> Thomas
>>>>>
>>>>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>>>>> <th...@gmail.com> wrote:
>>>>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>>>>> around to it.
>>>>>>>>
>>>>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>>>>> with the corresponding re2j imports.
>>>>>>>
>>>>>>> that would be super-cool.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br>.
That's great news! Thanks James!
Bruno
 
      From: James Ring <sj...@jdns.org>
 To: Commons Developers List <de...@commons.apache.org> 
 Sent: Tuesday, February 17, 2015 11:31 PM
 Subject: Re: Anyone interested in regular expressions, again?
   
Hey Benson,

Just wanted to let you and the rest of the commons dev list that re2j
is now in the open: please see https://github.com/google/re2j. Please
take a look!

Regards,
James

On Mon, Feb 9, 2015 at 12:16 PM, Benson Margulies <bi...@gmail.com> wrote:
> On Mon, Feb 9, 2015 at 1:36 PM, James Ring <sj...@jdns.org> wrote:
>> I'm working to bring re2j into the open, it will take some time
>> because Google's internal procedures for this kind of thing are pretty
>> lengthy. I'm hopeful it could be done in the next month or so.
>
> That is lovely news. Thanks!
>
>>
>> On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
>>> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
>>> <th...@gmail.com> wrote:
>>>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>>>> The irony here is that the Java HSRE port happened because it seemed
>>>>> easier than an RE2 port. Note the same statements about API's pretty
>>>>> much apply.
>>>>
>>>> I am sorry, my response was not very sensible wrt your original proposal.
>>>
>>> It seems very sensible to me. A team at Google producing re2j is
>>> likely to have produced a far superior comestible to what I did. If
>>> there's any possibility that it will emerge in, oh, a month or two, I
>>> don't think it makes sense to go to the trouble to pull the HSRE code
>>> into Apache.
>>>
>>>>
>>>> If we have another implementation that works fine and has a sufficiently
>>>> large enough community then I do not see a problem to include it in the
>>>> commons project, I would certainly be interested.
>>>>
>>>> Thomas
>>>>
>>>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>>>> <th...@gmail.com> wrote:
>>>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>>>> around to it.
>>>>>>>
>>>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>>>> with the corresponding re2j imports.
>>>>>>
>>>>>> that would be super-cool.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org


>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



   
 

Re: Anyone interested in regular expressions, again?

Posted by James Ring <sj...@jdns.org>.
Hey Benson,

Just wanted to let you and the rest of the commons dev list that re2j
is now in the open: please see https://github.com/google/re2j. Please
take a look!

Regards,
James

On Mon, Feb 9, 2015 at 12:16 PM, Benson Margulies <bi...@gmail.com> wrote:
> On Mon, Feb 9, 2015 at 1:36 PM, James Ring <sj...@jdns.org> wrote:
>> I'm working to bring re2j into the open, it will take some time
>> because Google's internal procedures for this kind of thing are pretty
>> lengthy. I'm hopeful it could be done in the next month or so.
>
> That is lovely news. Thanks!
>
>>
>> On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
>>> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
>>> <th...@gmail.com> wrote:
>>>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>>>> The irony here is that the Java HSRE port happened because it seemed
>>>>> easier than an RE2 port. Note the same statements about API's pretty
>>>>> much apply.
>>>>
>>>> I am sorry, my response was not very sensible wrt your original proposal.
>>>
>>> It seems very sensible to me. A team at Google producing re2j is
>>> likely to have produced a far superior comestible to what I did. If
>>> there's any possibility that it will emerge in, oh, a month or two, I
>>> don't think it makes sense to go to the trouble to pull the HSRE code
>>> into Apache.
>>>
>>>>
>>>> If we have another implementation that works fine and has a sufficiently
>>>> large enough community then I do not see a problem to include it in the
>>>> commons project, I would certainly be interested.
>>>>
>>>> Thomas
>>>>
>>>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>>>> <th...@gmail.com> wrote:
>>>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>>>> around to it.
>>>>>>>
>>>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>>>> with the corresponding re2j imports.
>>>>>>
>>>>>> that would be super-cool.
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
On Mon, Feb 9, 2015 at 1:36 PM, James Ring <sj...@jdns.org> wrote:
> I'm working to bring re2j into the open, it will take some time
> because Google's internal procedures for this kind of thing are pretty
> lengthy. I'm hopeful it could be done in the next month or so.

That is lovely news. Thanks!

>
> On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
>> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
>> <th...@gmail.com> wrote:
>>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>>> The irony here is that the Java HSRE port happened because it seemed
>>>> easier than an RE2 port. Note the same statements about API's pretty
>>>> much apply.
>>>
>>> I am sorry, my response was not very sensible wrt your original proposal.
>>
>> It seems very sensible to me. A team at Google producing re2j is
>> likely to have produced a far superior comestible to what I did. If
>> there's any possibility that it will emerge in, oh, a month or two, I
>> don't think it makes sense to go to the trouble to pull the HSRE code
>> into Apache.
>>
>>>
>>> If we have another implementation that works fine and has a sufficiently
>>> large enough community then I do not see a problem to include it in the
>>> commons project, I would certainly be interested.
>>>
>>> Thomas
>>>
>>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>>> <th...@gmail.com> wrote:
>>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>>> around to it.
>>>>>>
>>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>>> with the corresponding re2j imports.
>>>>>
>>>>> that would be super-cool.
>>>>>
>>>>> Thomas
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by James Ring <sj...@jdns.org>.
I'm working to bring re2j into the open, it will take some time
because Google's internal procedures for this kind of thing are pretty
lengthy. I'm hopeful it could be done in the next month or so.

On Tue, Feb 3, 2015 at 12:14 PM, Benson Margulies <bi...@gmail.com> wrote:
> On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
> <th...@gmail.com> wrote:
>> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>>> The irony here is that the Java HSRE port happened because it seemed
>>> easier than an RE2 port. Note the same statements about API's pretty
>>> much apply.
>>
>> I am sorry, my response was not very sensible wrt your original proposal.
>
> It seems very sensible to me. A team at Google producing re2j is
> likely to have produced a far superior comestible to what I did. If
> there's any possibility that it will emerge in, oh, a month or two, I
> don't think it makes sense to go to the trouble to pull the HSRE code
> into Apache.
>
>>
>> If we have another implementation that works fine and has a sufficiently
>> large enough community then I do not see a problem to include it in the
>> commons project, I would certainly be interested.
>>
>> Thomas
>>
>>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>>> <th...@gmail.com> wrote:
>>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>>> re2 library. The intention was to open source it but they just haven't got
>>>>> around to it.
>>>>>
>>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>>> compatible with java.util.regex. I know from personal experience that one
>>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>>> with the corresponding re2j imports.
>>>>
>>>> that would be super-cool.
>>>>
>>>> Thomas
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
On Tue, Feb 3, 2015 at 2:39 AM, Thomas Neidhart
<th...@gmail.com> wrote:
> On 02/03/2015 01:46 AM, Benson Margulies wrote:
>> The irony here is that the Java HSRE port happened because it seemed
>> easier than an RE2 port. Note the same statements about API's pretty
>> much apply.
>
> I am sorry, my response was not very sensible wrt your original proposal.

It seems very sensible to me. A team at Google producing re2j is
likely to have produced a far superior comestible to what I did. If
there's any possibility that it will emerge in, oh, a month or two, I
don't think it makes sense to go to the trouble to pull the HSRE code
into Apache.

>
> If we have another implementation that works fine and has a sufficiently
> large enough community then I do not see a problem to include it in the
> commons project, I would certainly be interested.
>
> Thomas
>
>> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
>> <th...@gmail.com> wrote:
>>> On 02/02/2015 11:20 PM, James Ring wrote:
>>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>>> re2 library. The intention was to open source it but they just haven't got
>>>> around to it.
>>>>
>>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>>> compatible with java.util.regex. I know from personal experience that one
>>>> can often benefit from re2j merely by replacing java.util.regex imports
>>>> with the corresponding re2j imports.
>>>
>>> that would be super-cool.
>>>
>>> Thomas
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/03/2015 01:46 AM, Benson Margulies wrote:
> The irony here is that the Java HSRE port happened because it seemed
> easier than an RE2 port. Note the same statements about API's pretty
> much apply.

I am sorry, my response was not very sensible wrt your original proposal.

If we have another implementation that works fine and has a sufficiently
large enough community then I do not see a problem to include it in the
commons project, I would certainly be interested.

Thomas

> On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
> <th...@gmail.com> wrote:
>> On 02/02/2015 11:20 PM, James Ring wrote:
>>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>>> re2 library. The intention was to open source it but they just haven't got
>>> around to it.
>>>
>>> I may try and get Google to put re2j up on GitHub so you all can take a
>>> look. AFAIK it is heavily used in Google and it has an API that is largely
>>> compatible with java.util.regex. I know from personal experience that one
>>> can often benefit from re2j merely by replacing java.util.regex imports
>>> with the corresponding re2j imports.
>>
>> that would be super-cool.
>>
>> Thomas
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
The irony here is that the Java HSRE port happened because it seemed
easier than an RE2 port. Note the same statements about API's pretty
much apply.

On Mon, Feb 2, 2015 at 6:21 PM, Thomas Neidhart
<th...@gmail.com> wrote:
> On 02/02/2015 11:20 PM, James Ring wrote:
>> I spoke to one of the authors of re2j, a Google-internal port of the C++
>> re2 library. The intention was to open source it but they just haven't got
>> around to it.
>>
>> I may try and get Google to put re2j up on GitHub so you all can take a
>> look. AFAIK it is heavily used in Google and it has an API that is largely
>> compatible with java.util.regex. I know from personal experience that one
>> can often benefit from re2j merely by replacing java.util.regex imports
>> with the corresponding re2j imports.
>
> that would be super-cool.
>
> Thomas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/02/2015 11:20 PM, James Ring wrote:
> I spoke to one of the authors of re2j, a Google-internal port of the C++
> re2 library. The intention was to open source it but they just haven't got
> around to it.
> 
> I may try and get Google to put re2j up on GitHub so you all can take a
> look. AFAIK it is heavily used in Google and it has an API that is largely
> compatible with java.util.regex. I know from personal experience that one
> can often benefit from re2j merely by replacing java.util.regex imports
> with the corresponding re2j imports.

that would be super-cool.

Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by James Ring <sj...@jdns.org>.
I spoke to one of the authors of re2j, a Google-internal port of the C++
re2 library. The intention was to open source it but they just haven't got
around to it.

I may try and get Google to put re2j up on GitHub so you all can take a
look. AFAIK it is heavily used in Google and it has an API that is largely
compatible with java.util.regex. I know from personal experience that one
can often benefit from re2j merely by replacing java.util.regex imports
with the corresponding re2j imports.

Regards,
James
On Feb 1, 2015 11:44 PM, "Thomas Neidhart" <th...@gmail.com>
wrote:

> On 02/02/2015 03:25 AM, sebb wrote:
> > I would not wish to move away from Java RE *unless* the RE syntax was
> > the same *and* the implementation was better performing *and* the
> > existing code suffered from poor performance.
> >
> > It might be OK if the alternate implementation was missing some
> > esoteric features, but I would be very wary of using any features that
> > were not in the Java implementation.
> >
> > The likelihood is that the Java implementation will (eventually)
> > become more performant, at which point it would be useful to be able
> > to revert to the Java version.
> > That requires a high degree of compatibilty to reduce the work involved.
> >
> > It might be more useful to produce a tool that detects inefficient RE
> > usage and suggests improvements.
>
> I just know re2 a bit, but it is a trade-off:
>
>  * linear-time evaluation vs. some features (e.g. backreferences)
>
> A comparison between different regular expression implementations can be
> found here:
>
> http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
>
> I am pretty sure the regexp implementation in java will not change,
> simply because of backwards compatibility reasons, but such a library
> would be useful as in many cases you do not need these additional
> features but want to ensure that your regular expression will be
> evaluated in linear time.
>
> Thomas
>
> >
> >
> > On 1 February 2015 at 22:35, James Carman <ja...@carmanconsulting.com>
> wrote:
> >> To be clear, I am not advocating this approach.  I was merely trying to
> >> illustrate what a nightmare such an endeavor would be. :)
> >>
> >> On Sunday, February 1, 2015, James Carman <ja...@carmanconsulting.com>
> >> wrote:
> >>
> >>> You would basically have to pick a canonical regex language if you
> want a
> >>> facade and be able to swap the regex library out.  Most of them are
> very
> >>> similar but they are not the same.
> >>>
> >>> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> >>> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
> >>>
> >>>> I think we'll need some clear performance advantages documented as
> well as
> >>>> any compatibility issues.
> >>>>
> >>>> This begs for a facade API IMO. I would not want to recode my app
> just to
> >>>> test one vs. the other, it should be pluggable.
> >>>>
> >>>> Gary
> >>>>
> >>>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> bimargulies@gmail.com
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> So, once upon a time, there was a regex library here. It was retired,
> >>>>> presumably on the grounds that it was rendered obsolete by the JRE's
> >>>>> native support.
> >>>>>
> >>>>> However, the JRE's regular expressions have a pretty severe problem;
> >>>>> they have unbounded (or at least, very, very, bad) execution time for
> >>>>> some combinations of data and regex.
> >>>>>
> >>>>> To cope with this, we ported the Henry Spencer regular expression
> >>>>> library (as found in TCL) from C to Java.
> >>>>>
> >>>>> Thus: https://github.com/basis-technology-corp/tcl-regex-java
> >>>>>
> >>>>> Is anyone interested in this? Give or take the possible IP muddle of
> >>>>> the original C Code, I could grant it easily.
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >>>> Java Persistence with Hibernate, Second Edition
> >>>> <http://www.manning.com/bauer3/>
> >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> >>>> Spring Batch in Action <http://www.manning.com/templier/>
> >>>> Blog: http://garygregory.wordpress.com
> >>>> Home: http://garygregory.com/
> >>>> Tweet! http://twitter.com/GaryGregory
> >>>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: Anyone interested in regular expressions, again?

Posted by Thomas Neidhart <th...@gmail.com>.
On 02/02/2015 03:25 AM, sebb wrote:
> I would not wish to move away from Java RE *unless* the RE syntax was
> the same *and* the implementation was better performing *and* the
> existing code suffered from poor performance.
> 
> It might be OK if the alternate implementation was missing some
> esoteric features, but I would be very wary of using any features that
> were not in the Java implementation.
> 
> The likelihood is that the Java implementation will (eventually)
> become more performant, at which point it would be useful to be able
> to revert to the Java version.
> That requires a high degree of compatibilty to reduce the work involved.
> 
> It might be more useful to produce a tool that detects inefficient RE
> usage and suggests improvements.

I just know re2 a bit, but it is a trade-off:

 * linear-time evaluation vs. some features (e.g. backreferences)

A comparison between different regular expression implementations can be
found here:

http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

I am pretty sure the regexp implementation in java will not change,
simply because of backwards compatibility reasons, but such a library
would be useful as in many cases you do not need these additional
features but want to ensure that your regular expression will be
evaluated in linear time.

Thomas

> 
> 
> On 1 February 2015 at 22:35, James Carman <ja...@carmanconsulting.com> wrote:
>> To be clear, I am not advocating this approach.  I was merely trying to
>> illustrate what a nightmare such an endeavor would be. :)
>>
>> On Sunday, February 1, 2015, James Carman <ja...@carmanconsulting.com>
>> wrote:
>>
>>> You would basically have to pick a canonical regex language if you want a
>>> facade and be able to swap the regex library out.  Most of them are very
>>> similar but they are not the same.
>>>
>>> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
>>>
>>>> I think we'll need some clear performance advantages documented as well as
>>>> any compatibility issues.
>>>>
>>>> This begs for a facade API IMO. I would not want to recode my app just to
>>>> test one vs. the other, it should be pluggable.
>>>>
>>>> Gary
>>>>
>>>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <bimargulies@gmail.com
>>>>>
>>>> wrote:
>>>>
>>>>> So, once upon a time, there was a regex library here. It was retired,
>>>>> presumably on the grounds that it was rendered obsolete by the JRE's
>>>>> native support.
>>>>>
>>>>> However, the JRE's regular expressions have a pretty severe problem;
>>>>> they have unbounded (or at least, very, very, bad) execution time for
>>>>> some combinations of data and regex.
>>>>>
>>>>> To cope with this, we ported the Henry Spencer regular expression
>>>>> library (as found in TCL) from C to Java.
>>>>>
>>>>> Thus: https://github.com/basis-technology-corp/tcl-regex-java
>>>>>
>>>>> Is anyone interested in this? Give or take the possible IP muddle of
>>>>> the original C Code, I could grant it easily.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by sebb <se...@gmail.com>.
I would not wish to move away from Java RE *unless* the RE syntax was
the same *and* the implementation was better performing *and* the
existing code suffered from poor performance.

It might be OK if the alternate implementation was missing some
esoteric features, but I would be very wary of using any features that
were not in the Java implementation.

The likelihood is that the Java implementation will (eventually)
become more performant, at which point it would be useful to be able
to revert to the Java version.
That requires a high degree of compatibilty to reduce the work involved.

It might be more useful to produce a tool that detects inefficient RE
usage and suggests improvements.


On 1 February 2015 at 22:35, James Carman <ja...@carmanconsulting.com> wrote:
> To be clear, I am not advocating this approach.  I was merely trying to
> illustrate what a nightmare such an endeavor would be. :)
>
> On Sunday, February 1, 2015, James Carman <ja...@carmanconsulting.com>
> wrote:
>
>> You would basically have to pick a canonical regex language if you want a
>> facade and be able to swap the regex library out.  Most of them are very
>> similar but they are not the same.
>>
>> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
>> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
>>
>>> I think we'll need some clear performance advantages documented as well as
>>> any compatibility issues.
>>>
>>> This begs for a facade API IMO. I would not want to recode my app just to
>>> test one vs. the other, it should be pluggable.
>>>
>>> Gary
>>>
>>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <bimargulies@gmail.com
>>> >
>>> wrote:
>>>
>>> > So, once upon a time, there was a regex library here. It was retired,
>>> > presumably on the grounds that it was rendered obsolete by the JRE's
>>> > native support.
>>> >
>>> > However, the JRE's regular expressions have a pretty severe problem;
>>> > they have unbounded (or at least, very, very, bad) execution time for
>>> > some combinations of data and regex.
>>> >
>>> > To cope with this, we ported the Henry Spencer regular expression
>>> > library (as found in TCL) from C to Java.
>>> >
>>> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
>>> >
>>> > Is anyone interested in this? Give or take the possible IP muddle of
>>> > the original C Code, I could grant it easily.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> > For additional commands, e-mail: dev-help@commons.apache.org
>>> >
>>> >
>>>
>>>
>>> --
>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <http://www.manning.com/bauer3/>
>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>> Spring Batch in Action <http://www.manning.com/templier/>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Gary Gregory <ga...@gmail.com>.
SQL is a language but it differs wildly from DBMS to DBMS. Good luck
writing a real application that uses _only_ SQL92 constructs...

Gary

On Mon, Feb 2, 2015 at 9:16 AM, James Carman <ja...@carmanconsulting.com>
wrote:

> With JDBC you have a common language, SQL.  Unless you're saying you pick
> the Java regex language as the standard and adapt to the others, you would
> have to come up with (or choose) another regex language.  My point is that
> creating a facade (a la commons-logging or slf4j) for regex would be
> troublesome because you have to support the same exact regex regardless of
> the underlying implementation and that would most likely involve some
> translation, since the regex libraries out there do not support the same
> syntax.  Merely coming up with the API is not difficult at all.  Perhaps
> you did not mean to provide a general purpose façade, but only one that
> supports the two libraries in question (assuming this other library
> supports Java regex)?
>
> On Monday, February 2, 2015, Gary Gregory <ga...@gmail.com> wrote:
>
> > On Sun, Feb 1, 2015 at 5:35 PM, James Carman <james@carmanconsulting.com
> > <javascript:;>>
> > wrote:
> >
> > > To be clear, I am not advocating this approach.  I was merely trying to
> > > illustrate what a nightmare such an endeavor would be. :)
> > >
> > > On Sunday, February 1, 2015, James Carman <james@carmanconsulting.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > You would basically have to pick a canonical regex language if you
> > want a
> > > > facade and be able to swap the regex library out.  Most of them are
> > very
> > > > similar but they are not the same.
> > >
> >
> > I would not need a canonical regex language. This could be like JDBC
> where
> > implementations vary.
> >
> > Gary
> >
> > >
> > > > On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> > <javascript:;>
> > > > <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com
> <javascript:;>');>>
> > wrote:
> > > >
> > > >> I think we'll need some clear performance advantages documented as
> > well
> > > as
> > > >> any compatibility issues.
> > > >>
> > > >> This begs for a facade API IMO. I would not want to recode my app
> just
> > > to
> > > >> test one vs. the other, it should be pluggable.
> > > >>
> > > >> Gary
> > > >>
> > > >> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> > > bimargulies@gmail.com <javascript:;>
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > So, once upon a time, there was a regex library here. It was
> > retired,
> > > >> > presumably on the grounds that it was rendered obsolete by the
> JRE's
> > > >> > native support.
> > > >> >
> > > >> > However, the JRE's regular expressions have a pretty severe
> problem;
> > > >> > they have unbounded (or at least, very, very, bad) execution time
> > for
> > > >> > some combinations of data and regex.
> > > >> >
> > > >> > To cope with this, we ported the Henry Spencer regular expression
> > > >> > library (as found in TCL) from C to Java.
> > > >> >
> > > >> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
> > > >> >
> > > >> > Is anyone interested in this? Give or take the possible IP muddle
> of
> > > >> > the original C Code, I could grant it easily.
> > > >> >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > <javascript:;>
> > > >> > For additional commands, e-mail: dev-help@commons.apache.org
> > <javascript:;>
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> E-Mail: garydgregory@gmail.com <javascript:;> | ggregory@apache.org
> > <javascript:;>
> > > >> Java Persistence with Hibernate, Second Edition
> > > >> <http://www.manning.com/bauer3/>
> > > >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > > >> Spring Batch in Action <http://www.manning.com/templier/>
> > > >> Blog: http://garygregory.wordpress.com
> > > >> Home: http://garygregory.com/
> > > >> Tweet! http://twitter.com/GaryGregory
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> > E-Mail: garydgregory@gmail.com <javascript:;> | ggregory@apache.org
> > <javascript:;>
> > Java Persistence with Hibernate, Second Edition
> > <http://www.manning.com/bauer3/>
> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > Spring Batch in Action <http://www.manning.com/templier/>
> > Blog: http://garygregory.wordpress.com
> > Home: http://garygregory.com/
> > Tweet! http://twitter.com/GaryGregory
> >
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: Anyone interested in regular expressions, again?

Posted by James Carman <ja...@carmanconsulting.com>.
With JDBC you have a common language, SQL.  Unless you're saying you pick
the Java regex language as the standard and adapt to the others, you would
have to come up with (or choose) another regex language.  My point is that
creating a facade (a la commons-logging or slf4j) for regex would be
troublesome because you have to support the same exact regex regardless of
the underlying implementation and that would most likely involve some
translation, since the regex libraries out there do not support the same
syntax.  Merely coming up with the API is not difficult at all.  Perhaps
you did not mean to provide a general purpose façade, but only one that
supports the two libraries in question (assuming this other library
supports Java regex)?

On Monday, February 2, 2015, Gary Gregory <ga...@gmail.com> wrote:

> On Sun, Feb 1, 2015 at 5:35 PM, James Carman <james@carmanconsulting.com
> <javascript:;>>
> wrote:
>
> > To be clear, I am not advocating this approach.  I was merely trying to
> > illustrate what a nightmare such an endeavor would be. :)
> >
> > On Sunday, February 1, 2015, James Carman <james@carmanconsulting.com
> <javascript:;>>
> > wrote:
> >
> > > You would basically have to pick a canonical regex language if you
> want a
> > > facade and be able to swap the regex library out.  Most of them are
> very
> > > similar but they are not the same.
> >
>
> I would not need a canonical regex language. This could be like JDBC where
> implementations vary.
>
> Gary
>
> >
> > > On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> <javascript:;>
> > > <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com <javascript:;>');>>
> wrote:
> > >
> > >> I think we'll need some clear performance advantages documented as
> well
> > as
> > >> any compatibility issues.
> > >>
> > >> This begs for a facade API IMO. I would not want to recode my app just
> > to
> > >> test one vs. the other, it should be pluggable.
> > >>
> > >> Gary
> > >>
> > >> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> > bimargulies@gmail.com <javascript:;>
> > >> >
> > >> wrote:
> > >>
> > >> > So, once upon a time, there was a regex library here. It was
> retired,
> > >> > presumably on the grounds that it was rendered obsolete by the JRE's
> > >> > native support.
> > >> >
> > >> > However, the JRE's regular expressions have a pretty severe problem;
> > >> > they have unbounded (or at least, very, very, bad) execution time
> for
> > >> > some combinations of data and regex.
> > >> >
> > >> > To cope with this, we ported the Henry Spencer regular expression
> > >> > library (as found in TCL) from C to Java.
> > >> >
> > >> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
> > >> >
> > >> > Is anyone interested in this? Give or take the possible IP muddle of
> > >> > the original C Code, I could grant it easily.
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> <javascript:;>
> > >> > For additional commands, e-mail: dev-help@commons.apache.org
> <javascript:;>
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> E-Mail: garydgregory@gmail.com <javascript:;> | ggregory@apache.org
> <javascript:;>
> > >> Java Persistence with Hibernate, Second Edition
> > >> <http://www.manning.com/bauer3/>
> > >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> > >> Spring Batch in Action <http://www.manning.com/templier/>
> > >> Blog: http://garygregory.wordpress.com
> > >> Home: http://garygregory.com/
> > >> Tweet! http://twitter.com/GaryGregory
> > >>
> > >
> >
>
>
>
> --
> E-Mail: garydgregory@gmail.com <javascript:;> | ggregory@apache.org
> <javascript:;>
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>

Re: Anyone interested in regular expressions, again?

Posted by Gary Gregory <ga...@gmail.com>.
On Sun, Feb 1, 2015 at 5:35 PM, James Carman <ja...@carmanconsulting.com>
wrote:

> To be clear, I am not advocating this approach.  I was merely trying to
> illustrate what a nightmare such an endeavor would be. :)
>
> On Sunday, February 1, 2015, James Carman <ja...@carmanconsulting.com>
> wrote:
>
> > You would basically have to pick a canonical regex language if you want a
> > facade and be able to swap the regex library out.  Most of them are very
> > similar but they are not the same.
>

I would not need a canonical regex language. This could be like JDBC where
implementations vary.

Gary

>
> > On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> > <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
> >
> >> I think we'll need some clear performance advantages documented as well
> as
> >> any compatibility issues.
> >>
> >> This begs for a facade API IMO. I would not want to recode my app just
> to
> >> test one vs. the other, it should be pluggable.
> >>
> >> Gary
> >>
> >> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> bimargulies@gmail.com
> >> >
> >> wrote:
> >>
> >> > So, once upon a time, there was a regex library here. It was retired,
> >> > presumably on the grounds that it was rendered obsolete by the JRE's
> >> > native support.
> >> >
> >> > However, the JRE's regular expressions have a pretty severe problem;
> >> > they have unbounded (or at least, very, very, bad) execution time for
> >> > some combinations of data and regex.
> >> >
> >> > To cope with this, we ported the Henry Spencer regular expression
> >> > library (as found in TCL) from C to Java.
> >> >
> >> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
> >> >
> >> > Is anyone interested in this? Give or take the possible IP muddle of
> >> > the original C Code, I could grant it easily.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> > For additional commands, e-mail: dev-help@commons.apache.org
> >> >
> >> >
> >>
> >>
> >> --
> >> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >> Java Persistence with Hibernate, Second Edition
> >> <http://www.manning.com/bauer3/>
> >> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> >> Spring Batch in Action <http://www.manning.com/templier/>
> >> Blog: http://garygregory.wordpress.com
> >> Home: http://garygregory.com/
> >> Tweet! http://twitter.com/GaryGregory
> >>
> >
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: Anyone interested in regular expressions, again?

Posted by James Carman <ja...@carmanconsulting.com>.
To be clear, I am not advocating this approach.  I was merely trying to
illustrate what a nightmare such an endeavor would be. :)

On Sunday, February 1, 2015, James Carman <ja...@carmanconsulting.com>
wrote:

> You would basically have to pick a canonical regex language if you want a
> facade and be able to swap the regex library out.  Most of them are very
> similar but they are not the same.
>
> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
>
>> I think we'll need some clear performance advantages documented as well as
>> any compatibility issues.
>>
>> This begs for a facade API IMO. I would not want to recode my app just to
>> test one vs. the other, it should be pluggable.
>>
>> Gary
>>
>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <bimargulies@gmail.com
>> >
>> wrote:
>>
>> > So, once upon a time, there was a regex library here. It was retired,
>> > presumably on the grounds that it was rendered obsolete by the JRE's
>> > native support.
>> >
>> > However, the JRE's regular expressions have a pretty severe problem;
>> > they have unbounded (or at least, very, very, bad) execution time for
>> > some combinations of data and regex.
>> >
>> > To cope with this, we ported the Henry Spencer regular expression
>> > library (as found in TCL) from C to Java.
>> >
>> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
>> >
>> > Is anyone interested in this? Give or take the possible IP muddle of
>> > the original C Code, I could grant it easily.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>> >
>>
>>
>> --
>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>

Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
Are you all familiar with the expression, 'excessive fascination with
the Apache brand?' Here I am expressing the exact opposite. If there
are people out there reading this who have a potential use for this
thing, I encourage them to kick its tires. If that leads to some
enthusiasm, I'm willing to do the work to make it into a contribution.
I am not trying to persuade the commons PMC to accept it just on my
say-so.  You will find that the API is within a tiny amount of code of
being facaded to match the JRE API, but, as noted, the language
differences amongst the various libraries make this of questionable
value. Mapping between then is far from easy -- unless you want to
start from this code, and then write an NFA->regex converter for the
various syntaces!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by James Carman <ja...@carmanconsulting.com>.
You would basically have to pick a canonical regex language if you want a
facade and be able to swap the regex library out.  Most of them are very
similar but they are not the same.

On Sunday, February 1, 2015, Gary Gregory <ga...@gmail.com> wrote:

> I think we'll need some clear performance advantages documented as well as
> any compatibility issues.
>
> This begs for a facade API IMO. I would not want to recode my app just to
> test one vs. the other, it should be pluggable.
>
> Gary
>
> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <bimargulies@gmail.com
> <javascript:;>>
> wrote:
>
> > So, once upon a time, there was a regex library here. It was retired,
> > presumably on the grounds that it was rendered obsolete by the JRE's
> > native support.
> >
> > However, the JRE's regular expressions have a pretty severe problem;
> > they have unbounded (or at least, very, very, bad) execution time for
> > some combinations of data and regex.
> >
> > To cope with this, we ported the Henry Spencer regular expression
> > library (as found in TCL) from C to Java.
> >
> > Thus: https://github.com/basis-technology-corp/tcl-regex-java
> >
> > Is anyone interested in this? Give or take the possible IP muddle of
> > the original C Code, I could grant it easily.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> <javascript:;>
> > For additional commands, e-mail: dev-help@commons.apache.org
> <javascript:;>
> >
> >
>
>
> --
> E-Mail: garydgregory@gmail.com <javascript:;> | ggregory@apache.org
> <javascript:;>
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>

Re: Anyone interested in regular expressions, again?

Posted by Gary Gregory <ga...@gmail.com>.
I think we'll need some clear performance advantages documented as well as
any compatibility issues.

This begs for a facade API IMO. I would not want to recode my app just to
test one vs. the other, it should be pluggable.

Gary

On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <bi...@gmail.com>
wrote:

> So, once upon a time, there was a regex library here. It was retired,
> presumably on the grounds that it was rendered obsolete by the JRE's
> native support.
>
> However, the JRE's regular expressions have a pretty severe problem;
> they have unbounded (or at least, very, very, bad) execution time for
> some combinations of data and regex.
>
> To cope with this, we ported the Henry Spencer regular expression
> library (as found in TCL) from C to Java.
>
> Thus: https://github.com/basis-technology-corp/tcl-regex-java
>
> Is anyone interested in this? Give or take the possible IP muddle of
> the original C Code, I could grant it easily.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: Anyone interested in regular expressions, again?

Posted by Jacques Le Roux <ja...@les7arts.com>.
We are still relying on a kind of "Apache ORO" in OFBiz, exactly jakarta-oro-2.0.8.jar http://jakarta.apache.org/site/news/news-2010-q3.html#20100901.2

While working on https://issues.apache.org/jira/browse/OFBIZ-5395 (out of scope here)
I noted the same reference than Bruno for the benchmark.
See the end of the OFBIZ-5395 description for this note and a remark about ORO cache: "with its cache, CompilerMatcher is more than an interesting 
alternative to regular Java regex."

Jacques

Le 31/01/2015 18:29, Benson Margulies a écrit :
> On Sat, Jan 31, 2015 at 12:22 PM, Bruno P. Kinoshita
> <br...@yahoo.com.br> wrote:
>> Hi Benson!
>> I wouldn't be able to help at the moment, but some years ago I had a performance issue in a Nutch crawler with regexes [1] and found about this other library that you mentioned I think. Are you talking about ORO?
> Yes, I believe I'm referring to ORO. I'm not really even looking for
> help. I am looking to see if there is enough interest to justify
> exploring pushing the code into the ASF. We did benchmarks, it's
> faster than built-in Java in a variety of cases. We are precluded from
> using GPL, so we didn't look seriously at OpenRegex. We want to have a
> system where outsiders can supply any regex they like and we don't
> have to worry about one of our servers being eaten by it.
>
>
>> I ended up changing the regex and never had a chance to play with ORO or other libraries to see if there was any advantage over not using JRE's regex API. Recently I had another performance problem with Apache Hive SerDe and performance problems and fixed it by changing the storage format and simplifying the regex.
>> Have you done any performance comparison with your code and other libraries? More or less like this [2]? Maybe this library could be used as an alternative in Nutch, Commons Crawl or in other projects when performance was important.
>> Lastly, I'm using OpenRegex (GPLv3) [3] in a project, in combination with Apache OpenNLP. It is a "regular expression language and engine" that users can use to match string and NLP tags. For example:
>> <string='My Company'> <lemma='be'> <postag='RB'>* (<adjective>: <postag='JJ'>))
>> Where <lemma='be'> will match any form of be/is/was/were/etc, <postag='RB'>* one or more adverbs and the last part of the expression will find a named token "adjective" (JJ is the Penn Tree Bank part of speech tag for adjectives).
>> Not sure if your library will work only with text or will support any other approaches too. OpenRegex has some TODO's in the GitHub Wiki but hasn't been updated in a while. Maybe if your library could work similarly to OpenRegex, it could be incorporated in Apache OpenNLP too. Even the LanguageTool team demonstrated some interest in experimenting it [4].
>> Just food for thought :-)Bruno
>> [1] https://issues.apache.org/jira/browse/NUTCH-1014[2] http://tusker.org/regex/regex_benchmark.html[3] https://github.com/knowitall/openregex[4] http://sourceforge.net/p/languagetool/mailman/languagetool-devel/thread/69f229c0a58d3245d511dafaa82feafc%40danielnaber.de/#msg31280519
>>
>>        From: Benson Margulies <bi...@gmail.com>
>>   To: Commons Developers List <de...@commons.apache.org>
>>   Sent: Saturday, January 31, 2015 1:58 PM
>>   Subject: Anyone interested in regular expressions, again?
>>
>> So, once upon a time, there was a regex library here. It was retired,
>> presumably on the grounds that it was rendered obsolete by the JRE's
>> native support.
>>
>> However, the JRE's regular expressions have a pretty severe problem;
>> they have unbounded (or at least, very, very, bad) execution time for
>> some combinations of data and regex.
>>
>> To cope with this, we ported the Henry Spencer regular expression
>> library (as found in TCL) from C to Java.
>>
>> Thus: https://github.com/basis-technology-corp/tcl-regex-java
>>
>> Is anyone interested in this? Give or take the possible IP muddle of
>> the original C Code, I could grant it easily.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by Benson Margulies <bi...@gmail.com>.
On Sat, Jan 31, 2015 at 12:22 PM, Bruno P. Kinoshita
<br...@yahoo.com.br> wrote:
> Hi Benson!
> I wouldn't be able to help at the moment, but some years ago I had a performance issue in a Nutch crawler with regexes [1] and found about this other library that you mentioned I think. Are you talking about ORO?

Yes, I believe I'm referring to ORO. I'm not really even looking for
help. I am looking to see if there is enough interest to justify
exploring pushing the code into the ASF. We did benchmarks, it's
faster than built-in Java in a variety of cases. We are precluded from
using GPL, so we didn't look seriously at OpenRegex. We want to have a
system where outsiders can supply any regex they like and we don't
have to worry about one of our servers being eaten by it.


> I ended up changing the regex and never had a chance to play with ORO or other libraries to see if there was any advantage over not using JRE's regex API. Recently I had another performance problem with Apache Hive SerDe and performance problems and fixed it by changing the storage format and simplifying the regex.
> Have you done any performance comparison with your code and other libraries? More or less like this [2]? Maybe this library could be used as an alternative in Nutch, Commons Crawl or in other projects when performance was important.
> Lastly, I'm using OpenRegex (GPLv3) [3] in a project, in combination with Apache OpenNLP. It is a "regular expression language and engine" that users can use to match string and NLP tags. For example:
> <string='My Company'> <lemma='be'> <postag='RB'>* (<adjective>: <postag='JJ'>))
> Where <lemma='be'> will match any form of be/is/was/were/etc, <postag='RB'>* one or more adverbs and the last part of the expression will find a named token "adjective" (JJ is the Penn Tree Bank part of speech tag for adjectives).
> Not sure if your library will work only with text or will support any other approaches too. OpenRegex has some TODO's in the GitHub Wiki but hasn't been updated in a while. Maybe if your library could work similarly to OpenRegex, it could be incorporated in Apache OpenNLP too. Even the LanguageTool team demonstrated some interest in experimenting it [4].
> Just food for thought :-)Bruno
> [1] https://issues.apache.org/jira/browse/NUTCH-1014[2] http://tusker.org/regex/regex_benchmark.html[3] https://github.com/knowitall/openregex[4] http://sourceforge.net/p/languagetool/mailman/languagetool-devel/thread/69f229c0a58d3245d511dafaa82feafc%40danielnaber.de/#msg31280519
>
>       From: Benson Margulies <bi...@gmail.com>
>  To: Commons Developers List <de...@commons.apache.org>
>  Sent: Saturday, January 31, 2015 1:58 PM
>  Subject: Anyone interested in regular expressions, again?
>
> So, once upon a time, there was a regex library here. It was retired,
> presumably on the grounds that it was rendered obsolete by the JRE's
> native support.
>
> However, the JRE's regular expressions have a pretty severe problem;
> they have unbounded (or at least, very, very, bad) execution time for
> some combinations of data and regex.
>
> To cope with this, we ported the Henry Spencer regular expression
> library (as found in TCL) from C to Java.
>
> Thus: https://github.com/basis-technology-corp/tcl-regex-java
>
> Is anyone interested in this? Give or take the possible IP muddle of
> the original C Code, I could grant it easily.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: Anyone interested in regular expressions, again?

Posted by "Bruno P. Kinoshita" <br...@yahoo.com.br>.
Hi Benson!
I wouldn't be able to help at the moment, but some years ago I had a performance issue in a Nutch crawler with regexes [1] and found about this other library that you mentioned I think. Are you talking about ORO? 
I ended up changing the regex and never had a chance to play with ORO or other libraries to see if there was any advantage over not using JRE's regex API. Recently I had another performance problem with Apache Hive SerDe and performance problems and fixed it by changing the storage format and simplifying the regex.
Have you done any performance comparison with your code and other libraries? More or less like this [2]? Maybe this library could be used as an alternative in Nutch, Commons Crawl or in other projects when performance was important.
Lastly, I'm using OpenRegex (GPLv3) [3] in a project, in combination with Apache OpenNLP. It is a "regular expression language and engine" that users can use to match string and NLP tags. For example: 
<string='My Company'> <lemma='be'> <postag='RB'>* (<adjective>: <postag='JJ'>))
Where <lemma='be'> will match any form of be/is/was/were/etc, <postag='RB'>* one or more adverbs and the last part of the expression will find a named token "adjective" (JJ is the Penn Tree Bank part of speech tag for adjectives).
Not sure if your library will work only with text or will support any other approaches too. OpenRegex has some TODO's in the GitHub Wiki but hasn't been updated in a while. Maybe if your library could work similarly to OpenRegex, it could be incorporated in Apache OpenNLP too. Even the LanguageTool team demonstrated some interest in experimenting it [4]. 
Just food for thought :-)Bruno
[1] https://issues.apache.org/jira/browse/NUTCH-1014[2] http://tusker.org/regex/regex_benchmark.html[3] https://github.com/knowitall/openregex[4] http://sourceforge.net/p/languagetool/mailman/languagetool-devel/thread/69f229c0a58d3245d511dafaa82feafc%40danielnaber.de/#msg31280519
 
      From: Benson Margulies <bi...@gmail.com>
 To: Commons Developers List <de...@commons.apache.org> 
 Sent: Saturday, January 31, 2015 1:58 PM
 Subject: Anyone interested in regular expressions, again?
   
So, once upon a time, there was a regex library here. It was retired,
presumably on the grounds that it was rendered obsolete by the JRE's
native support.

However, the JRE's regular expressions have a pretty severe problem;
they have unbounded (or at least, very, very, bad) execution time for
some combinations of data and regex.

To cope with this, we ported the Henry Spencer regular expression
library (as found in TCL) from C to Java.

Thus: https://github.com/basis-technology-corp/tcl-regex-java

Is anyone interested in this? Give or take the possible IP muddle of
the original C Code, I could grant it easily.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org