You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Hyrum K Wright <hy...@hyrumwright.org> on 2012/10/17 18:20:20 UTC

Regular expressions in Subversion

There are several places where regular expressions would be useful in
Subversion.  Off hand, the new log --search feature and svn:ignore
properties feel like they'd be use candidates for regexs, and they
could probably also apply to authz rules eventually.  I'm sure there
are more.

Historically, the argument against using regexes in Subversion was
that they would be a potential DoS target, or could lead to unexpected
performance problems.  However, I recently ran across a new regex
engine, RE2, which claims to have linear time complexity in the size
of the input with the ability to also limit memory consumption[1].
These come at the expenses of a couple of less-used regex features,
and it feels like it'd be a good fit for Subversion.

There are a few downsides:
 * RE2 is written in C++; we'd need a C wrapper to use it within Subversion.
 * RE2 packages don't exist for a number of platforms, though we might
be able to embedded it in Subversion.
 * RE2 doesn't claim to compile on Windows. :)

Anyway, I was just wondering what folks feelings were about this
possibility, and whether it's finally time to start thinking about
proper regex support within Subversion.

-Hyrum

[1] https://code.google.com/p/re2/

Re: Regular expressions in Subversion

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Hyrum K Wright wrote on Wed, Oct 17, 2012 at 13:46:34 -0400:
> On Wed, Oct 17, 2012 at 1:22 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Hyrum K Wright wrote on Wed, Oct 17, 2012 at 12:20:20 -0400:
> >> These come at the expenses of a couple of less-used regex features,
> >
> > PLease be objective/specific: it doesn't support backreferences and
> > zero-width lookarounds.
> 
> Correct (sorry for the hand waving).  I envision our use cases being
> more pattern matching-like, and not search-and-replace.
> 

Agreed, but how is that relevant?  Does that library not support some
feature tha is useful for search-and-replace?  I think it does support
capturing groups...

> -Hyrum

Re: Regular expressions in Subversion

Posted by Hyrum K Wright <hy...@hyrumwright.org>.
On Wed, Oct 17, 2012 at 1:22 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Hyrum K Wright wrote on Wed, Oct 17, 2012 at 12:20:20 -0400:
>> These come at the expenses of a couple of less-used regex features,
>
> PLease be objective/specific: it doesn't support backreferences and
> zero-width lookarounds.

Correct (sorry for the hand waving).  I envision our use cases being
more pattern matching-like, and not search-and-replace.

-Hyrum

Re: Regular expressions in Subversion

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Hyrum K Wright wrote on Wed, Oct 17, 2012 at 12:20:20 -0400:
> These come at the expenses of a couple of less-used regex features,

PLease be objective/specific: it doesn't support backreferences and
zero-width lookarounds.

RE: Regular expressions in Subversion

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Hyrum K Wright [mailto:hyrum@hyrumwright.org]
> Sent: woensdag 17 oktober 2012 18:20
> To: Subversion Development
> Subject: Regular expressions in Subversion
> 
> There are several places where regular expressions would be useful in
> Subversion.  Off hand, the new log --search feature and svn:ignore
> properties feel like they'd be use candidates for regexs, and they
> could probably also apply to authz rules eventually.  I'm sure there
> are more.
> 
> Historically, the argument against using regexes in Subversion was
> that they would be a potential DoS target, or could lead to unexpected
> performance problems.  However, I recently ran across a new regex
> engine, RE2, which claims to have linear time complexity in the size
> of the input with the ability to also limit memory consumption[1].
> These come at the expenses of a couple of less-used regex features,
> and it feels like it'd be a good fit for Subversion.
> 
> There are a few downsides:
>  * RE2 is written in C++; we'd need a C wrapper to use it within
Subversion.
>  * RE2 packages don't exist for a number of platforms, though we might
> be able to embedded it in Subversion.
>  * RE2 doesn't claim to compile on Windows. :)

Just checking the RE2 site, it appears that RE2 should work with Visual
Studio 2008 and later.

If/when APR and HTTPD switch to these 'recent' versions there should be no
problem in getting this supported on both server and client side.

Last time I checked the Httpd 2.2.X binaries were still delivered using the
1998 / 6.0 version of the compiler and the project explicitly says they are
incompatible with that old versions of C++.

I would guess that CollabNet, Wandisco and VisualSVN use modern compilers
for the entire product chain by now, so maybe we can stop caring about the
default binaries.

	Bert


Re: Regular expressions in Subversion

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Sperling wrote on Wed, Oct 17, 2012 at 20:55:07 +0200:
> We could change log --search from glob to regex before release, I suppose.
> But I myself don't really miss regex support in log --search.

Or we could require the argument to --search to be of the form
/^glob:.*$/.

Re: Regular expressions in Subversion

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Oct 17, 2012 at 08:32:18PM +0200, Stefan Küng wrote:
> In such situations, to keep the compatibility, why not just
> introduce a new property, e.g. 'svn:ignore-regex' which if set takes
> precedence over the 'svn:ignore' property?

Something similar would need to be done for log --search (if it gets
released as currently implemented, with glob syntax), the authz file,
and any other existing features that use glob syntax (are there any?).

Adding regex support only for the purpose of squeezing it into
existing features is not really worth the effort, I would guess.
It might turn out to involve more work than expected due to backwards
compatibility constraints, while I don't see anyone complaining loudly
about our lack of regex support.

I would prefer adding regex support as part of some compelling new
feature which requires regex support to function properly.

We could change log --search from glob to regex before release, I suppose.
But I myself don't really miss regex support in log --search.

Re: Regular expressions in Subversion

Posted by Stefan Küng <to...@gmail.com>.
On 17.10.2012 20:17, Stefan Sperling wrote:
> On Wed, Oct 17, 2012 at 12:20:20PM -0400, Hyrum K Wright wrote:
>> There are several places where regular expressions would be useful in
>> Subversion.  Off hand, the new log --search feature and svn:ignore
>> properties feel like they'd be use candidates for regexs, and they
>> could probably also apply to authz rules eventually.  I'm sure there
>> are more.
>
> How do we change existing features from glob syntax to regex without
> breaking compatibility? A glob pattern can of course be expressed in
> regex syntax, but the syntax isn't equivalent. Can we reliably detect
> whether a given pattern (say, on a line within svn:ignore) is a glob
> pattern or a regex?
>
> For instance, what about "a*.txt"? In glob this means: starts with 'a',
> followed by any amount of characters, and ends with '.txt'. But in regex
> it means: starts with any number of 'a' characters (including zero), and
> ends with '.txt'.
>
> $ ls *.txt | egrep 'a*.txt'
> a.txt
> aaa.txt
> bbb.txt
> foo.txt
> $ ls a*.txt | egrep 'a*.txt'
> a.txt
> aaa.txt
> $
>
> (where ls uses glob syntax and egrep uses regex)

In such situations, to keep the compatibility, why not just introduce a 
new property, e.g. 'svn:ignore-regex' which if set takes precedence over 
the 'svn:ignore' property?

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net

Re: Regular expressions in Subversion

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Oct 17, 2012 at 12:20:20PM -0400, Hyrum K Wright wrote:
> There are several places where regular expressions would be useful in
> Subversion.  Off hand, the new log --search feature and svn:ignore
> properties feel like they'd be use candidates for regexs, and they
> could probably also apply to authz rules eventually.  I'm sure there
> are more.

How do we change existing features from glob syntax to regex without
breaking compatibility? A glob pattern can of course be expressed in
regex syntax, but the syntax isn't equivalent. Can we reliably detect
whether a given pattern (say, on a line within svn:ignore) is a glob
pattern or a regex?

For instance, what about "a*.txt"? In glob this means: starts with 'a',
followed by any amount of characters, and ends with '.txt'. But in regex
it means: starts with any number of 'a' characters (including zero), and
ends with '.txt'.

$ ls *.txt | egrep 'a*.txt' 
a.txt
aaa.txt
bbb.txt
foo.txt
$ ls a*.txt | egrep 'a*.txt'
a.txt
aaa.txt
$ 

(where ls uses glob syntax and egrep uses regex)

Re: Regular expressions in Subversion

Posted by Branko Čibej <br...@wandisco.com>.
On 17.10.2012 13:53, Stefan Küng wrote:
> On 17.10.2012 18:20, Hyrum K Wright wrote:
>> Anyway, I was just wondering what folks feelings were about this
>> possibility, and whether it's finally time to start thinking about
>> proper regex support within Subversion.
>
> I'm wondering: if C++ is an option, why not use the regex that comes
> with the C++ library itself?
> http://en.cppreference.com/w/cpp/regex

Because regexes are part of a C++ standard that was only published a
year ago and most compiler/library toolchains don't support them yet.
And, of course, they're not likely to have the kind of linear
performance characteristics promised by RE2.

-- Brane

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: Regular expressions in Subversion

Posted by Stefan Küng <to...@gmail.com>.
On 17.10.2012 18:20, Hyrum K Wright wrote:
> There are several places where regular expressions would be useful in
> Subversion.  Off hand, the new log --search feature and svn:ignore
> properties feel like they'd be use candidates for regexs, and they
> could probably also apply to authz rules eventually.  I'm sure there
> are more.
>
> Historically, the argument against using regexes in Subversion was
> that they would be a potential DoS target, or could lead to unexpected
> performance problems.  However, I recently ran across a new regex
> engine, RE2, which claims to have linear time complexity in the size
> of the input with the ability to also limit memory consumption[1].
> These come at the expenses of a couple of less-used regex features,
> and it feels like it'd be a good fit for Subversion.
>
> There are a few downsides:
>   * RE2 is written in C++; we'd need a C wrapper to use it within Subversion.
>   * RE2 packages don't exist for a number of platforms, though we might
> be able to embedded it in Subversion.
>   * RE2 doesn't claim to compile on Windows. :)
>
> Anyway, I was just wondering what folks feelings were about this
> possibility, and whether it's finally time to start thinking about
> proper regex support within Subversion.

I'm wondering: if C++ is an option, why not use the regex that comes 
with the C++ library itself?
http://en.cppreference.com/w/cpp/regex

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net