You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Branko Čibej <br...@wandisco.com> on 2014/07/28 19:38:58 UTC

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

On 28.07.2014 19:19, stefan2@apache.org wrote:
> Author: stefan2
> Date: Mon Jul 28 17:19:47 2014
> New Revision: 1614080
>
> URL: http://svn.apache.org/r1614080
> Log:
> On the authzperf branch: Implement the notion of path rule ordering
> by making svn_config_t iterate through sections in declaration order.
> This is done using a simple linked list because we can't remove
> sections but only add them.

No no no no!

Changing the svn_config_t structure is /not/ the right to do this. The
correct approach here is to parametrise the svn_config parser with a
constructor method, then create a new constructor for the authz parser
which will then get all entries in the file order. Please revert these
svn_config changes.

> Without support for wildcards or other patterns, the config struct
> will only contain a single section for each path.  With wildcards,
> there may be more than one.  All three of the follwing path rules
> are equally applicable:
>
>   [/foo/*.doc]
>   * = r
>
>   [/foo/bar.*]
>   * = rw
>
>   [/foo/bar.doc]
>   jrandom =
>
> To make conflicts managable, always pick the last path rule.  That
> means users should specify general rules first, followed by exceptions
> and finally (and optionally) critical rules that deny certain access,
> potentially globally.

Are you saying that a wildcard path should match before a concrete path
when it happens to appear later in the authz file? Perish the thought.
Exact matches should always override fuzzy matches, anything else is not
intuitive at all and we'll receive a ton of spurious bug reports about
how authz files don't work.

Precedence ordering is fine between identical exact matches or between
equivalent wildcard matches for the same path, but not between these two
categories.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 07.08.2014 10:30, Ivan Zhakov wrote:
> On 2 August 2014 18:45, Branko Čibej <br...@wandisco.com> wrote:
>> On 02.08.2014 15:20, Alan Barrett wrote:
>>
>> On Tue, 29 Jul 2014, C. Michael Pilato wrote:
>>
>> This leaves open the possibility of a future additional match mode for
>> regular expressions (using the '^:' magic prefix) should we seek to "go
>> there".
>>
>>  # Match the trunk and branches subdirs and their children.
>>  [repos:^:^/(trunk|branches)(/.*|$)$]
>>
>>
>> As a user, I'd prefer to see keywords rather than unusual characters to
>> designate special syntax.  For example, I would prefer
>> [glob:/tags/*/private] and [regex:/tags/.*/private] to [*:/tags/*/private]
>> and [^:/tags/.*/private].
>>
>>
>> Quite understood. Unfortunately, the syntax you propose above cannot work;
>> [thing:/path] is already a valid syntax, and if we did what you propose,
>> we'd no longer be able to create rules for repositories named "glob" and
>> "regex". Worse, this could change the meaning of existing authz files.
>>
>> We could do something like this, though:
>>
>>     [:glob:/path]
>>     [:glob:repos:/path]
>>
>> In the old authz files, these rules would be invalid, because the repository
>> name part (the substring up to the first colon) cannot be empty.
>>
>> Thoughts?
>>
> I don't like the proposed syntax, but I don't why. Probably because
> many ':' is very confusing.

I agree, but I have trouble finding another syntax that is guaranteed to
not collide with any existing valid rules. I can't think of another
character other than ":" after the opening "[" that's guaranteed to make
the rule invalid by current syntax.

We could use

    [:{tag1, tag2}repos:/path]

but I think that's even worse ...


> My thoughts on this topic:
> 1. Do we really need 'regex' pattern matching? Would not be this
> overkill and performance killer?

Using "regex" instead of "glob" is not the only potential use of the
rule tag. One that we've discussed on list, for example, is rules that
must match the whole path, not just a path prefix.


> 2. Could be we make regex/glob/simple pattern matching per-file option?

That would not be a very good idea. It would require authz admins to
either not use patterns, or sacrifice the performance of///all/ lookups
by treating them as patterns, even when they're not.

And of course, it would remove the option of per-rule properties, such
as the whole-path match mentioned above.

> I'd like to avoid complexities in authz files if possible.

I think it's 10 years too late for that. :)

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 2 August 2014 18:45, Branko Čibej <br...@wandisco.com> wrote:
> On 02.08.2014 15:20, Alan Barrett wrote:
>
> On Tue, 29 Jul 2014, C. Michael Pilato wrote:
>
> This leaves open the possibility of a future additional match mode for
> regular expressions (using the '^:' magic prefix) should we seek to "go
> there".
>
>  # Match the trunk and branches subdirs and their children.
>  [repos:^:^/(trunk|branches)(/.*|$)$]
>
>
> As a user, I'd prefer to see keywords rather than unusual characters to
> designate special syntax.  For example, I would prefer
> [glob:/tags/*/private] and [regex:/tags/.*/private] to [*:/tags/*/private]
> and [^:/tags/.*/private].
>
>
> Quite understood. Unfortunately, the syntax you propose above cannot work;
> [thing:/path] is already a valid syntax, and if we did what you propose,
> we'd no longer be able to create rules for repositories named "glob" and
> "regex". Worse, this could change the meaning of existing authz files.
>
> We could do something like this, though:
>
>     [:glob:/path]
>     [:glob:repos:/path]
>
> In the old authz files, these rules would be invalid, because the repository
> name part (the substring up to the first colon) cannot be empty.
>
> Thoughts?
>
I don't like the proposed syntax, but I don't why. Probably because
many ':' is very confusing.

My thoughts on this topic:
1. Do we really need 'regex' pattern matching? Would not be this
overkill and performance killer?
2. Could be we make regex/glob/simple pattern matching per-file option?

I'd like to avoid complexities in authz files if possible.

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Aug 2, 2014 at 4:45 PM, Branko Čibej <br...@wandisco.com> wrote:

>  On 02.08.2014 15:20, Alan Barrett wrote:
>
> On Tue, 29 Jul 2014, C. Michael Pilato wrote:
>
> This leaves open the possibility of a future additional match mode for
> regular expressions (using the '^:' magic prefix) should we seek to "go
> there".
>
>  # Match the trunk and branches subdirs and their children.
>  [repos:^:^/(trunk|branches)(/.*|$)$]
>
>
> As a user, I'd prefer to see keywords rather than unusual characters to
> designate special syntax.  For example, I would prefer
> [glob:/tags/*/private] and [regex:/tags/.*/private] to [*:/tags/*/private]
> and [^:/tags/.*/private].
>
>
> Quite understood. Unfortunately, the syntax you propose above cannot work;
> [thing:/path] is already a valid syntax, and if we did what you propose,
> we'd no longer be able to create rules for repositories named "glob" and
> "regex". Worse, this could change the meaning of existing authz files.
>
> We could do something like this, though:
>
>     [:glob:/path]
>     [:glob:repos:/path]
>
> In the old authz files, these rules would be invalid, because the
> repository name part (the substring up to the first colon) cannot be empty.
>
> Thoughts?
>

+1. Wouldn't win a beauty award but ticks all the right boxes for me:

* No collision with existing valid rules.
* Would be rejected as invalid by old code.
* Allows for any number of types to be invented later.
* Robust as the type is specified with the contents and has only local
scope.

-- Stefan^2.

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 02.08.2014 15:20, Alan Barrett wrote:
> On Tue, 29 Jul 2014, C. Michael Pilato wrote:
>> This leaves open the possibility of a future additional match mode for
>> regular expressions (using the '^:' magic prefix) should we seek to "go
>> there".
>>
>>  # Match the trunk and branches subdirs and their children.
>>  [repos:^:^/(trunk|branches)(/.*|$)$]
>
> As a user, I'd prefer to see keywords rather than unusual characters
> to designate special syntax.  For example, I would prefer
> [glob:/tags/*/private] and [regex:/tags/.*/private] to
> [*:/tags/*/private] and [^:/tags/.*/private].

Quite understood. Unfortunately, the syntax you propose above cannot
work; [thing:/path] is already a valid syntax, and if we did what you
propose, we'd no longer be able to create rules for repositories named
"glob" and "regex". Worse, this could change the meaning of existing
authz files.

We could do something like this, though:

    [:glob:/path]
    [:glob:repos:/path]

In the old authz files, these rules would be invalid, because the
repository name part (the substring up to the first colon) cannot be empty.

Thoughts?

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Alan Barrett <ap...@cequrux.com>.
On Tue, 29 Jul 2014, C. Michael Pilato wrote:
>This leaves open the possibility of a future additional match mode for
>regular expressions (using the '^:' magic prefix) should we seek to "go
>there".
>
>  # Match the trunk and branches subdirs and their children.
>  [repos:^:^/(trunk|branches)(/.*|$)$]

As a user, I'd prefer to see keywords rather than unusual 
characters to designate special syntax.  For example, I would 
prefer [glob:/tags/*/private] and [regex:/tags/.*/private] to 
[*:/tags/*/private] and [^:/tags/.*/private].

--apb (Alan Barrett)

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 07/29/2014 06:46 AM, Branko Čibej wrote:
> If we want to
> support wildcards, we /have/ to invent a different syntax for wildcard
> rules. My working proposal is to add a literal '*' before the leading
> slash in the path:
> 
>     [*/fspath-pattern]
>     [repos:*/fspath-pattern]
> 
> because neither of these syntaxes are currently valid. We can invent a
> different syntax, as long as it maintains that property; and we cannot
> replace the brackets around the rule with something else, without making
> very intrusive changes in the config parser code.

I like this.  I would suggest using '*:' as your magic prefix, though.
That way, the '*' is less likely to be mistaken for part of the match
pattern itself.

  # Match 'private' directories withing the tags of all repositories.
  [*:/tags/*/private]

Also, in this mode, the match pattern needn't start with '/', right?

  # Match any item named "public" regardless of tree depth.
  [repos:*:*/public]

This leaves open the possibility of a future additional match mode for
regular expressions (using the '^:' magic prefix) should we seek to "go
there".

  # Match the trunk and branches subdirs and their children.
  [repos:^:^/(trunk|branches)(/.*|$)$]

> If we decide that the config parser is fair game (since I'm already
> changing it to optimize the construction of the in-memory authz
> representation), we could, for example, pick
> 
>     {/fspath-pattern}
>     {repos:/fspath-pattern}

Ugh.  I'd *really* prefer not to have to reimplement basic .INI parsing
for various extra-Subversion tools that read and write authz files, such
as ViewVC.


Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 12:46, Branko Čibej wrote:
> If we decide that the config parser is fair game (since I'm already
> changing it to optimize the construction of the in-memory authz
> representation), we could, for example, pick
>     {/fspath-pattern}
>     {repos:/fspath-pattern}

... although note that in that case, the patterns could not contain a
literal brace, just as section names currently cannot contain a literal
bracket, but can contain a brace.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Syntax for authz with wildcards [was: svn commit: r1614080 ...]

Posted by Julian Foad <ju...@btopenworld.com>.
(Just re-posting with a decent subject line so other people can see it.)

- Julian

Johan Corveleyn wrote:
> Branko Čibej wrote:
>> On 29.07.2014 12:25, Julian Foad wrote:
>>> Brane wrote:
>>>>     [*/projects/f*/include/*.h]
>>>>     [*/projects/*o/include/*.h]
>>> 
>>> (Ugh, I don't like the prefix '*' which I think you're using to mean
>>> 'wildcards enabled'.)
>> 
>> I don't care what prefix we use, but I do care that the rules that contain
>> wildcards have a different syntax than the rules that do not. Otherwise,
>> existing authz files (that may contain literal wildcard/pattern characters)
>> would stop working.
>> 
>> We currently support two forms of patterns:
>> 
>>     [/literal-fspath]
>>     [repos:/literal-fspath]
>> 
>> the distinguishing feature is that in both cases, the literal-fspath must
>> begin with a slash, otherwise the rule is invalid. If we want to support
>> wildcards, we have to invent a different syntax for wildcard rules. My
>> working proposal is to add a literal '*' before the leading slash in the
>> path:
>> 
>>     [*/fspath-pattern]
>>     [repos:*/fspath-pattern]
>> 
>> because neither of these syntaxes are currently valid. We can invent a
>> different syntax, as long as it maintains that property; and we cannot
>> replace the brackets around the rule with something else, without making
>> very intrusive changes in the config parser code.
> 
> Another approach might be to add an "options" section to the authz
> file (in a way that it is ignored by older servers), where one could
> specify a matching strategy (and if needed 'priority' strategy and so
> on).
> 
> [options]
> matching = literal | basic-wildcards | regex | ...
> 
> Then it's not hidden as a syntax-artifact, but more explicit ...

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Jul 29, 2014 at 12:55 PM, Johan Corveleyn <jc...@gmail.com> wrote:
> On Tue, Jul 29, 2014 at 12:46 PM, Branko Čibej <br...@wandisco.com> wrote:
>> On 29.07.2014 12:25, Julian Foad wrote:
>>
>> Brane wrote:
>>
>> if the patterns were:
>>
>>     [*/projects/f*/include/*.h]
>>     [*/projects/*o/include/*.h]
>>
>> (Ugh, I don't like the prefix '*' which I think you're using to mean
>> 'wildcards enabled'.)
>>
>>
>> I don't care what prefix we use, but I do care that the rules that contain
>> wildcards have a different syntax than the rules that do not. Otherwise,
>> existing authz files (that may contain literal wildcard/pattern characters)
>> would stop working.
>>
>> We currently support two forms of patterns:
>>
>>     [/literal-fspath]
>>     [repos:/literal-fspath]
>>
>> the distinguishing feature is that in both cases, the literal-fspath must
>> begin with a slash, otherwise the rule is invalid. If we want to support
>> wildcards, we have to invent a different syntax for wildcard rules. My
>> working proposal is to add a literal '*' before the leading slash in the
>> path:
>>
>>     [*/fspath-pattern]
>>     [repos:*/fspath-pattern]
>>
>> because neither of these syntaxes are currently valid. We can invent a
>> different syntax, as long as it maintains that property; and we cannot
>> replace the brackets around the rule with something else, without making
>> very intrusive changes in the config parser code.
>
> Another approach might be to add an "options" section to the authz
> file (in a way that it is ignored by older servers), where one could
> specify a matching strategy (and if needed 'priority' strategy and so
> on).
>
> [options]
> matching = literal | basic-wildcards | regex | ...
>
> Then it's not hidden as a syntax-artifact, but more explicit ...

Although my current code uses the "*" prefix solution for simplicity
such that I could evaluate the inner workings of the matching engine,
I agree that we should be more explicit.

At least we need to define proper escape sequences etc.
Even simple wildcard support needs escapement and at least
_two_ special characters. Right now, we have no way to say
that a rule shall only apply to full paths instead of the whole subtree:

Rules defined for "/foo/*.h" will also apply to "/foo/byextension.h/bar/baz"
unless being overwritten for specific sub-paths.

So, a syntax like "/foo/*.h$" would be nice. Or an option on the
rule set itself that indicates this path rule as "no subtree match".
Then, it will still apply to "/foo/byextension.h" itself but none of
its files or sub-directories.

As Brane suggested, I'll update my wiki doc to match the actual
implementation and put it up for everyone to read. Then commit
the remainder of code such that people may start playing with it.

-- Stefan^2.

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 12:55, Johan Corveleyn wrote:
> On Tue, Jul 29, 2014 at 12:46 PM, Branko Čibej <br...@wandisco.com> wrote:
>> On 29.07.2014 12:25, Julian Foad wrote:
>>
>> Brane wrote:
>>
>> if the patterns were:
>>
>>     [*/projects/f*/include/*.h]
>>     [*/projects/*o/include/*.h]
>>
>> (Ugh, I don't like the prefix '*' which I think you're using to mean
>> 'wildcards enabled'.)
>>
>>
>> I don't care what prefix we use, but I do care that the rules that contain
>> wildcards have a different syntax than the rules that do not. Otherwise,
>> existing authz files (that may contain literal wildcard/pattern characters)
>> would stop working.
>>
>> We currently support two forms of patterns:
>>
>>     [/literal-fspath]
>>     [repos:/literal-fspath]
>>
>> the distinguishing feature is that in both cases, the literal-fspath must
>> begin with a slash, otherwise the rule is invalid. If we want to support
>> wildcards, we have to invent a different syntax for wildcard rules. My
>> working proposal is to add a literal '*' before the leading slash in the
>> path:
>>
>>     [*/fspath-pattern]
>>     [repos:*/fspath-pattern]
>>
>> because neither of these syntaxes are currently valid. We can invent a
>> different syntax, as long as it maintains that property; and we cannot
>> replace the brackets around the rule with something else, without making
>> very intrusive changes in the config parser code.
> Another approach might be to add an "options" section to the authz
> file (in a way that it is ignored by older servers), where one could
> specify a matching strategy (and if needed 'priority' strategy and so
> on).
>
> [options]
> matching = literal | basic-wildcards | regex | ...
>
> Then it's not hidden as a syntax-artifact, but more explicit ...

Yes, well, I really don't want to go overboard with options here. The
point of the exercise is to make current authz parsing a lot faster. If
we can get in some basic wildcard support that doesn't affect users who
do not use it, then fine; but confusing people with several different
pattern syntaxes is too much.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Jul 29, 2014 at 12:46 PM, Branko Čibej <br...@wandisco.com> wrote:
> On 29.07.2014 12:25, Julian Foad wrote:
>
> Brane wrote:
>
> if the patterns were:
>
>     [*/projects/f*/include/*.h]
>     [*/projects/*o/include/*.h]
>
> (Ugh, I don't like the prefix '*' which I think you're using to mean
> 'wildcards enabled'.)
>
>
> I don't care what prefix we use, but I do care that the rules that contain
> wildcards have a different syntax than the rules that do not. Otherwise,
> existing authz files (that may contain literal wildcard/pattern characters)
> would stop working.
>
> We currently support two forms of patterns:
>
>     [/literal-fspath]
>     [repos:/literal-fspath]
>
> the distinguishing feature is that in both cases, the literal-fspath must
> begin with a slash, otherwise the rule is invalid. If we want to support
> wildcards, we have to invent a different syntax for wildcard rules. My
> working proposal is to add a literal '*' before the leading slash in the
> path:
>
>     [*/fspath-pattern]
>     [repos:*/fspath-pattern]
>
> because neither of these syntaxes are currently valid. We can invent a
> different syntax, as long as it maintains that property; and we cannot
> replace the brackets around the rule with something else, without making
> very intrusive changes in the config parser code.

Another approach might be to add an "options" section to the authz
file (in a way that it is ignored by older servers), where one could
specify a matching strategy (and if needed 'priority' strategy and so
on).

[options]
matching = literal | basic-wildcards | regex | ...

Then it's not hidden as a syntax-artifact, but more explicit ...

-- 
Johan

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 12:25, Julian Foad wrote:
> Brane wrote:
>> if the patterns were:
>>
>>     [*/projects/f*/include/*.h]
>>     [*/projects/*o/include/*.h]
> (Ugh, I don't like the prefix '*' which I think you're using to mean 'wildcards enabled'.)

I don't care what prefix we use, but I do care that the rules that
contain wildcards have a different syntax than the rules that do not.
Otherwise, existing authz files (that may contain literal
wildcard/pattern characters) would stop working.

We currently support two forms of patterns:

    [/literal-fspath]
    [repos:/literal-fspath]

the distinguishing feature is that in both cases, the literal-fspath
must begin with a slash, otherwise the rule is invalid. If we want to
support wildcards, we /have/ to invent a different syntax for wildcard
rules. My working proposal is to add a literal '*' before the leading
slash in the path:

    [*/fspath-pattern]
    [repos:*/fspath-pattern]

because neither of these syntaxes are currently valid. We can invent a
different syntax, as long as it maintains that property; and we cannot
replace the brackets around the rule with something else, without making
very intrusive changes in the config parser code.

If we decide that the config parser is fair game (since I'm already
changing it to optimize the construction of the in-memory authz
representation), we could, for example, pick

    {/fspath-pattern}
    {repos:/fspath-pattern}


>> where both "f*" and "*o" match "foo", we should always pick
>> whichever pattern was defined first in the authz file.
> So if the (wildcard-enabled) patterns were
>
>     /projects/*/include/*.h
>     /projects/foo_*/include/foo.h
>
> then for a path that matches both we'd always pick the pattern defined first, because in the first (most significant) path component where the patterns differ, neither provides an exact match. In this case we'd pick the one that intuitively looks a 'looser' match.

I agree, intuitively, we'd want to pick the more concrete pattern (i.e.,
one that contains some literal characters) over a less concrete pattern.
I'm not sure how to make that rule consistent and easily understood,
other than saying that a single '*' binds less strongly than any pattern
with literals in it.

> 1. Do these rules match any other existing practice that we could compare with?

I'm not aware of any such existing practice, other than WANdisco's old
access control product; which I explicitly do not want to take as an
example.

> 2. Are these rules documented in the authzperf branch yet?

Stefan^2 circulated a raw wiki page with the documentation, but I don't
know if he actually published that. Stefan^2 -- please let's see your
docs in the wiki or in BRANCH-README.

> 3. Do these rules match the behaviour Stefan is currently implementing in the authzperf branch?

I haven't reviewed that part of his changes yet. :)

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Julian Foad <ju...@btopenworld.com>.
Brane wrote:
> if the patterns were:
>
>    [*/projects/f*/include/*.h]
>    [*/projects/*o/include/*.h]

(Ugh, I don't like the prefix '*' which I think you're using to mean 'wildcards enabled'.)

> where both "f*" and "*o" match "foo", we should always pick
> whichever pattern was defined first in the authz file.

So if the (wildcard-enabled) patterns were

    /projects/*/include/*.h
    /projects/foo_*/include/foo.h

then for a path that matches both we'd always pick the pattern defined first, because in the first (most significant) path component where the patterns differ, neither provides an exact match. In this case we'd pick the one that intuitively looks a 'looser' match.

1. Do these rules match any other existing practice that we could compare with?

2. Are these rules documented in the authzperf branch yet?

3. Do these rules match the behaviour Stefan is currently implementing in the authzperf branch?

Just wondering.

- Julian


Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 12:06, Julian Foad wrote:
> Brane wrote:
>> Julian Foad wrote:
>>> such as:   /projects/*/include/*_private.h and   /projects/*/include/foo.h
>> I'll just note that these two patterns cannot possibly be confused,
>> since "*_private.h" and "foo.h" will never match the same file name.
> Oops, I meant 'foo_private.h' for the latter.

Ah. In that case, I think I already answered the question.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Julian Foad <ju...@btopenworld.com>.
Brane wrote:
> Julian Foad wrote:
>> such as:   /projects/*/include/*_private.h and   /projects/*/include/foo.h
> 
> I'll just note that these two patterns cannot possibly be confused,
> since "*_private.h" and "foo.h" will never match the same file name.

Oops, I meant 'foo_private.h' for the latter.

- Julian

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 09:49, Julian Foad wrote:
> Branko Čibej wrote:
>> Author: stefan2
>>> Without support for wildcards or other patterns, the config struct
>>> will only contain a single section for each path.  With wildcards,
>>> there may be more than one.  All three of the follwing path rules
>>> are equally applicable:
>>>
>>>    [/foo/*.doc]
>>>    * = r
>>>
>>>    [/foo/bar.*]
>>>    * = rw
>>>
>>>    [/foo/bar.doc]
>>>    jrandom =
>>>
>>>   To make conflicts managable, always pick the last path rule.  That
>>> means users should specify general rules first, followed by exceptions
>>> and finally (and optionally) critical rules that deny certain access,
>>> potentially globally.
>> Are you saying that a wildcard path should match before a concrete
>> path when it happens to appear later in the authz file? Perish the
>> thought. Exact matches should always override fuzzy matches,
>> anything else is not intuitive at all and we'll receive a ton of
>> spurious bug reports about how authz files don't work.
>>
>> Precedence ordering is fine between identical exact matches or
>> between equivalent wildcard matches for the same path, but not
>> between these two categories.
> I agree with the sentiment, but am not sure how to define such rules. What does "equivalent wildcard matches for the same path" mean? Do you mean to find a definition that will give the expected precedence between two wildcard patterns where one is intuitively more "exact" than the other, such as:
>
>   /projects/*/include/*_private.h
>
> and
>
>   /projects/*/include/foo.h

I'll just note that these two patterns cannot possibly be confused,
since "*_private.h" and "foo.h" will never match the same file name.


The first rule is obviously that we have to maintain compatibility with
the current semantics, which means that whatever matches the most path
segments in the query, wins. So if we have two patterns, where one is a
prefix of the other, even when only one contains wildcards:

    [/projects/foo/include]
    [*/projects/foo/include/*.h]

then when we match the following path:

    /projects/foo/include/.h/bar.h

we'll select the second set of rules, because the patterns share the
same prefix, but the second one matches one more component of the path.
The same would also hold if we the patterns:

    [*/projects/*]
    [*/projects/*/include]

We also have to define precedence in terms of top-down lookups. A simple
example: if we have two patterns:

    [*/projects/foo/include/*.h]
    [*/projects/*/include/*.h]

the first pattern should always take precedence over the second, because
it contains an exact match for /projects/foo. On the other hand if the
patterns were:

    [*/projects/f*/include/*.h]
    [*/projects/*o/include/*.h]

where both "f*" and "*o" match "foo", we should always pick whichever
pattern was defined first in the authz file.

If we have two instances of the same pattern

    [*/projects/foo/include/*.h]

that define a different set of rules, then, to maintain compatibility
with current semantics, rules in the second pattern must override rules
in the first pattern with identical keys.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
> Author: stefan2
>> Without support for wildcards or other patterns, the config struct
>> will only contain a single section for each path.  With wildcards,
>> there may be more than one.  All three of the follwing path rules
>> are equally applicable:
>> 
>>   [/foo/*.doc]
>>   * = r
>> 
>>   [/foo/bar.*]
>>   * = rw
>> 
>>   [/foo/bar.doc]
>>   jrandom =
>> 
>>  To make conflicts managable, always pick the last path rule.  That
>> means users should specify general rules first, followed by exceptions
>> and finally (and optionally) critical rules that deny certain access,
>> potentially globally.
> 
> Are you saying that a wildcard path should match before a concrete
> path when it happens to appear later in the authz file? Perish the
> thought. Exact matches should always override fuzzy matches,
> anything else is not intuitive at all and we'll receive a ton of
> spurious bug reports about how authz files don't work.
> 
> Precedence ordering is fine between identical exact matches or
> between equivalent wildcard matches for the same path, but not
> between these two categories.

I agree with the sentiment, but am not sure how to define such rules. What does "equivalent wildcard matches for the same path" mean? Do you mean to find a definition that will give the expected precedence between two wildcard patterns where one is intuitively more "exact" than the other, such as:

  /projects/*/include/*_private.h

and

  /projects/*/include/foo.h

?

- Julian


Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@wandisco.com> writes:

> Let's please have the rules documented in the wiki, then we can comment
> on them.
>
>> * Select the path rule(s) that cover the largest section of the path.
>>   This is what we do today and there can be only one such match
>>   w/o the use of wildcards.
>>
>> * If there is more than one such rule, apply the one declared last.
>>   This seems to be the easiest rule to follow and understand.
>
> Well, I disagree, I'd expect an exact match to override a wildcard match.

That seems reasonable but raises the expectation that there is a more
general order for wildcards.  Given a path:

  /A/B/C/D

is either of the patterns:

  /A/B/C/*
  /*/B/C/D 

an override for the pattern:

  /*/*/*/*

?

Does either of the patterns /A/*/*/D and /*/B/C/* override the other?

If we choose and release some simple wildcard behaviour how easy will it
be to implement more sophisticated behaviour in future releases?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Jul 29, 2014 at 5:55 PM, Branko Čibej <br...@wandisco.com> wrote:

>  On 29.07.2014 15:55, Stefan Fuhrmann wrote:
>

[...]

>    So, while you can without doubt parse the file streamily,
>  you cannot begin to build a data model with authz
> semantics from it streamily. We can go down the path
>  of inventing some intermediate model that performs
> slightly better than the current svn_config_t but that
> is a lot of coding simply to avoid a perfectly reasonable
> API bump.
>
>
> Actually, it's not as perfectly reasonable as all that. Consider:
>
> [foo]
> x = 1
>
> [bar]
> y = 2
>
> [foo]
> z = 3
>
> So ... in what order should these sections be enumerated?
>

The svn_config_t implementation transforms this into

[foo]
x=1
z=3

[bar]
y=2

and would enumerate them accordingly. And I agree that is not
particularly nice. But it is well-defined and the original ordering
was probably an artefact of some tool generating the file in the
first place - meaning that it's the tool's job to communicate the
semantics that the user intended. (assuming we encourage
users to sort their path rules.)

The precedence problem must be addressed by whatever data
model we chose. Consider:

[*/**/*a*]
...
[*/**/*b*]
...

I don't see any reasonable criterion, that the user could influence,
to decide which one of those two apply in case of conflict - other
than their declaration order.

Once the declaration order is available in the data model, we
can still decide when / to what extend to use this information.

-- Stefan^2.

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 15:55, Stefan Fuhrmann wrote:
>
> On Tue, Jul 29, 2014 at 3:28 PM, Branko Čibej <brane@wandisco.com
> <ma...@wandisco.com>> wrote:
>
>     On 29.07.2014 14:55, Stefan Fuhrmann wrote:
>>     On Tue, Jul 29, 2014 at 1:52 PM, Branko Čibej <brane@wandisco.com
>>     <ma...@wandisco.com>> wrote:
>>
>>         On 29.07.2014 13:34, Stefan Fuhrmann wrote:
>>>         On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> <ma...@wandisco.com> wrote:
>>>>         On 28.07.2014 19:19, stefan2@apache.org <ma...@apache.org> wrote:
>>>>>         Author: stefan2
>>>>>         Date: Mon Jul 28 17:19:47 2014
>>>>>         New Revision: 1614080
>>>>>
>>>>>         URL: http://svn.apache.org/r1614080
>>>>>         Log:
>>>>>         On the authzperf branch: Implement the notion of path rule ordering
>>>>>         by making svn_config_t iterate through sections in declaration order.
>>>>>         This is done using a simple linked list because we can't remove
>>>>>         sections but only add them.
>>>>         No no no no!
>>>         What exactly did my change break?
>>>
>>>         Right now, the svn_config interface returns the sections in random order.
>>>         I changed the code such that this random order happens to be the same
>>>         as the (static and dynamic) declaration order.
>>>
>>>>         Changing the svn_config_t structure is not the right to do this. The correct
>>>>         approach here is to parametrise the svn_config parser with a constructor
>>>>         method, then create a new constructor for the authz parser which will then
>>>>         get all entries in the file order. Please revert these svn_config changes.
>>>         I reverted the changes for now as I agree that the new behaviour
>>>         should be somewhat more explicit.
>>>
>>>         However, since this is not about construction (the result would
>>>         still be a hash)
>>
>>         You are making way too many assumptions about that. See my
>>         svn_config parser parametrization in r1614213
>>
> [...]
>
>>
>>>     The second point is more severe, though, as it prevents a nicer
>>>     intermediate
>>>     model than what svn_config_t already provides. The following is
>>>     a legal authz
>>>     file:
>>
>     I'm going to skip commenting on this because I don't see how it's
>     relevant.
>
>  
> The point is that you can't interpret any of the stream
> beyond the key/value model already implemented by
> svn_config_t until you have read the whole stream.

That's true for config files, but not necessarily for authz files.


> So, while you can without doubt parse the file streamily,
> you cannot begin to build a data model with authz
> semantics from it streamily. We can go down the path
> of inventing some intermediate model that performs
> slightly better than the current svn_config_t but that
> is a lot of coding simply to avoid a perfectly reasonable
> API bump.

Actually, it's not as perfectly reasonable as all that. Consider:

[foo]
x = 1

[bar]
y = 2

[foo]
z = 3

So ... in what order should these sections be enumerated?

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Jul 29, 2014 at 3:28 PM, Branko Čibej <br...@wandisco.com> wrote:

>  On 29.07.2014 14:55, Stefan Fuhrmann wrote:
>
>  On Tue, Jul 29, 2014 at 1:52 PM, Branko Čibej <br...@wandisco.com> wrote:
>
>>   On 29.07.2014 13:34, Stefan Fuhrmann wrote:
>>
>> On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> <br...@wandisco.com> wrote:
>>
>>  On 28.07.2014 19:19, stefan2@apache.org wrote:
>>
>>  Author: stefan2
>> Date: Mon Jul 28 17:19:47 2014
>> New Revision: 1614080
>>
>> URL: http://svn.apache.org/r1614080
>> Log:
>> On the authzperf branch: Implement the notion of path rule ordering
>> by making svn_config_t iterate through sections in declaration order.
>> This is done using a simple linked list because we can't remove
>> sections but only add them.
>>
>>  No no no no!
>>
>>  What exactly did my change break?
>>
>> Right now, the svn_config interface returns the sections in random order.
>> I changed the code such that this random order happens to be the same
>> as the (static and dynamic) declaration order.
>>
>>
>>  Changing the svn_config_t structure is not the right to do this. The correct
>> approach here is to parametrise the svn_config parser with a constructor
>> method, then create a new constructor for the authz parser which will then
>> get all entries in the file order. Please revert these svn_config changes.
>>
>>  I reverted the changes for now as I agree that the new behaviour
>> should be somewhat more explicit.
>>
>> However, since this is not about construction (the result would
>> still be a hash)
>>
>>
>>  You are making way too many assumptions about that. See my svn_config
>> parser parametrization in r1614213
>>
> [...]

>
>   The second point is more severe, though, as it prevents a nicer
> intermediate
> model than what svn_config_t already provides. The following is a legal
> authz
> file:
>
>
> I'm going to skip commenting on this because I don't see how it's relevant.
>

The point is that you can't interpret any of the stream
beyond the key/value model already implemented by
svn_config_t until you have read the whole stream.

So, while you can without doubt parse the file streamily,
you cannot begin to build a data model with authz
semantics from it streamily. We can go down the path
of inventing some intermediate model that performs
slightly better than the current svn_config_t but that
is a lot of coding simply to avoid a perfectly reasonable
API bump.


>
> [...]
>
>
>   Well, I disagree, I'd expect an exact match to override a wildcard
> match.
>
>  I agree that we could impose such a rule. However, it's semantics
> may not be obvious to everyone because we always apply rules to whole
> subtrees. Consider:
>
>  [/foo/bar]
>  user = r
>
>  [/**/bar]
>  user =
>
>
> ** is a special case because it matches more than one path segment, so in
> this case I would expect the longest match to take precedence.
>
> I agree though that it complicates matters, not so much in the
> implementation, but in documentation; explaining to users why '*' behaves
> differently than '**' might be harder than explaining why a wildcard
> defined later in the file completely overrides an exact match defined
> earlier.
>

Well, I agree that we can implement it either way and even
change our mind later on without real extra effort and I
personally don't care too much which behavior we are
going to release eventually.

That said, I still believe that sorting ones rules by path is
good practice and then having the "last match wins" rule
should be obvious and easier to apply by the user. I don't
think that the "pattern path rules are inferior" rule is hard
to understand. My concern is more that there is way for
the user to invert that behavior.

-- Stefan^2.

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 14:55, Stefan Fuhrmann wrote:
> On Tue, Jul 29, 2014 at 1:52 PM, Branko Čibej <brane@wandisco.com
> <ma...@wandisco.com>> wrote:
>
>     On 29.07.2014 13:34, Stefan Fuhrmann wrote:
>>     On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> <ma...@wandisco.com> wrote:
>>>     On 28.07.2014 19:19, stefan2@apache.org <ma...@apache.org> wrote:
>>>>     Author: stefan2
>>>>     Date: Mon Jul 28 17:19:47 2014
>>>>     New Revision: 1614080
>>>>
>>>>     URL: http://svn.apache.org/r1614080
>>>>     Log:
>>>>     On the authzperf branch: Implement the notion of path rule ordering
>>>>     by making svn_config_t iterate through sections in declaration order.
>>>>     This is done using a simple linked list because we can't remove
>>>>     sections but only add them.
>>>     No no no no!
>>     What exactly did my change break?
>>
>>     Right now, the svn_config interface returns the sections in random order.
>>     I changed the code such that this random order happens to be the same
>>     as the (static and dynamic) declaration order.
>>
>>>     Changing the svn_config_t structure is not the right to do this. The correct
>>>     approach here is to parametrise the svn_config parser with a constructor
>>>     method, then create a new constructor for the authz parser which will then
>>>     get all entries in the file order. Please revert these svn_config changes.
>>     I reverted the changes for now as I agree that the new behaviour
>>     should be somewhat more explicit.
>>
>>     However, since this is not about construction (the result would
>>     still be a hash)
>
>     You are making way too many assumptions about that. See my
>     svn_config parser parametrization in r1614213
>
>
>
>
>>      but about iteration, I see only two basic options:
>>
>>     1. Invent a new data structure that duplicates 80+% of svn_config_t
>>       to provide the same functionality but different ordering through
>>       its 'enumerate sections' interface. This basically would be r1614080
>>       minus a few bits that we happen not to call in the authz context.
>>
>>     2. Reapply r1614080 but let 'enumerate sections' return data in
>>       hash order as on /trunk. Bump the 'enumerate sections' API
>>       to include a "use declaration order' option.
>
>     3. Feed the authz in-memory representation directly from the
>     config parser, without using an intermediate svn_config_t. This is
>     what we should do. The authz structure is inherently different
>     from a config structure, because it has different semantics and
>     access patterns.
>
>
> Except that this will be equivalent to 1. but worse and here is why.
>
> In many situation, two "users" will be used with the same connection,
> anonymous and the actual user. Having svn_config_t as an intermediate
> representation prevents us from parsing the file twice. Just reading
> the in-memory config representation twice should be faster than parsing
> the data twice with no intermediate. Throw svnserve's ability to cache
> the intermediate rep into the mix and things look even worse for direct
> parsing.

Well then, change the in-memory representation so that which it doesn't
depend on which user's behalf the lookup is being done. I believe Ben
already suggested that, and for exactly the reason you mention above.
The in-memory representation should be immutable once parsed. That
implies that you can optimize path lookup, and you can optimize (alias,
group) -> user mappings, you can even cache path->ACL mappings, but you
cannot cache (user, path)->access mappings.

I believe that last is exactly what you're doing, and I very strongly
disagree with this idea. Here's why: while we do currently parse the
authz file once per connection in mod_authz_svn, it's entirely
reasonable to someday not do that in svnserve (or some future long-lived
server) and instead either pick up changes automatically (by looking at
the file timestamp, or noticing changes at commit time), or implementing
a SIGHUP or similar mechanism to refresh the in-memory configuration.

Another reason is that during authz lookup, we must not ignore rules
that do not (implicitly) mention the user: we must first find the rule
that is the best match for the path, and only later look at what
permissions, if any, it gives the user, because you can't assume that if
the user doesn't have access to /foo, she won't have access to /foo/bar/baz.

I'm also not at all convinced that re-parsing the whole authz
configuration for every new user is going to save time; the opposite is
more likely.

Implementing my suggestion does imply that the ACL has to become a
rather smart structure that can be used to quickly determine the access
rights for a given user.

[...]

> The second point is more severe, though, as it prevents a nicer
> intermediate
> model than what svn_config_t already provides. The following is a
> legal authz
> file:

I'm going to skip commenting on this because I don't see how it's relevant.

[...]

> Well, I disagree, I'd expect an exact match to override a wildcard match.
>
> I agree that we could impose such a rule. However, it's semantics
> may not be obvious to everyone because we always apply rules to whole
> subtrees. Consider:
>
> [/foo/bar]
> user = r
>
> [/**/bar]
> user =

** is a special case because it matches more than one path segment, so
in this case I would expect the longest match to take precedence.

I agree though that it complicates matters, not so much in the
implementation, but in documentation; explaining to users why '*'
behaves differently than '**' might be harder than explaining why a
wildcard defined later in the file completely overrides an exact match
defined earlier.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Jul 29, 2014 at 1:52 PM, Branko Čibej <br...@wandisco.com> wrote:

>  On 29.07.2014 13:34, Stefan Fuhrmann wrote:
>
> On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> <br...@wandisco.com> wrote:
>
>  On 28.07.2014 19:19, stefan2@apache.org wrote:
>
>  Author: stefan2
> Date: Mon Jul 28 17:19:47 2014
> New Revision: 1614080
>
> URL: http://svn.apache.org/r1614080
> Log:
> On the authzperf branch: Implement the notion of path rule ordering
> by making svn_config_t iterate through sections in declaration order.
> This is done using a simple linked list because we can't remove
> sections but only add them.
>
>  No no no no!
>
>  What exactly did my change break?
>
> Right now, the svn_config interface returns the sections in random order.
> I changed the code such that this random order happens to be the same
> as the (static and dynamic) declaration order.
>
>
>  Changing the svn_config_t structure is not the right to do this. The correct
> approach here is to parametrise the svn_config parser with a constructor
> method, then create a new constructor for the authz parser which will then
> get all entries in the file order. Please revert these svn_config changes.
>
>  I reverted the changes for now as I agree that the new behaviour
> should be somewhat more explicit.
>
> However, since this is not about construction (the result would
> still be a hash)
>
>
> You are making way too many assumptions about that. See my svn_config
> parser parametrization in r1614213
>


>
>   but about iteration, I see only two basic options:
>
> 1. Invent a new data structure that duplicates 80+% of svn_config_t
>   to provide the same functionality but different ordering through
>   its 'enumerate sections' interface. This basically would be r1614080
>   minus a few bits that we happen not to call in the authz context.
>
> 2. Reapply r1614080 but let 'enumerate sections' return data in
>   hash order as on /trunk. Bump the 'enumerate sections' API
>   to include a "use declaration order' option.
>
>
> 3. Feed the authz in-memory representation directly from the config
> parser, without using an intermediate svn_config_t. This is what we should
> do. The authz structure is inherently different from a config structure,
> because it has different semantics and access patterns.
>

Except that this will be equivalent to 1. but worse and here is why.

In many situation, two "users" will be used with the same connection,
anonymous and the actual user. Having svn_config_t as an intermediate
representation prevents us from parsing the file twice. Just reading
the in-memory config representation twice should be faster than parsing
the data twice with no intermediate. Throw svnserve's ability to cache
the intermediate rep into the mix and things look even worse for direct
parsing.

The second point is more severe, though, as it prevents a nicer intermediate
model than what svn_config_t already provides. The following is a legal
authz
file:

[/]
@group = r

[groups]
group = &alias

[aliases]
alias = user

[/]
@other = w

[groups]
other = &other

[aliases]
other = user

It has to be interpreted as

[aliases]
other = user
alias = user

[groups]
group = &alias
other = &other

[/]
@group = r
@other = w

and is equivalent to

[/]
user = rw


>  IMO, number 2 is the better choice as it gives full control over the
> added functionality without adding significant maintenance costs.
>
>
> You're already creating an in-memory structure that is significantly
> different from svn_config_t, and does not require the presence of an
> svn_config_t, nor does it have to enumerate anything, since it can be
> constructed incrementally and in-order during parsing.
>

This time, you are making to many assumptions. See above.


>
> [...]
>
>  Are you saying that a wildcard path should match before a concrete path when
> it happens to appear later in the authz file? Perish the thought. Exact
> matches should always override fuzzy matches, anything else is not intuitive
> at all and we'll receive a ton of spurious bug reports about how authz files
> don't work.
>
>  No, that is not what I'm saying. My rules are simple:
>
>
> Let's please have the rules documented in the wiki, then we can comment on
> them.
>

I will update the design to match the current implementation and then
upload it.


>  * Select the path rule(s) that cover the largest section of the path.
>   This is what we do today and there can be only one such match
>   w/o the use of wildcards.
>
> * If there is more than one such rule, apply the one declared last.
>   This seems to be the easiest rule to follow and understand.
>
>
> Well, I disagree, I'd expect an exact match to override a wildcard match.
>

I agree that we could impose such a rule. However, it's semantics
may not be obvious to everyone because we always apply rules to whole
subtrees. Consider:

[/foo/bar]
user = r

[/**/bar]
user =

Applying your suggestion, the user has access to /foo/bar and /for/bar/x
but not to /foo/bar/bar nor /foo/bar/bar/x. Basically, you are moving all
wildcard rules to the top of the file and then apply precedence.

There is no need to introduce the concept of "exact match" to model
the same set of rules. OTOH, complicates system behavior as it only
applies to the path itself but not its sub-paths. Moreover, we cannot fix
this by extending it to whole subtree because

[/]
* = r

is a common rule. It would then override any wildcard rule.

-- Stefan^2.

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Branko Čibej <br...@wandisco.com>.
On 29.07.2014 13:34, Stefan Fuhrmann wrote:
> On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> wrote:
>> On 28.07.2014 19:19, stefan2@apache.org wrote:
>>> Author: stefan2
>>> Date: Mon Jul 28 17:19:47 2014
>>> New Revision: 1614080
>>>
>>> URL: http://svn.apache.org/r1614080
>>> Log:
>>> On the authzperf branch: Implement the notion of path rule ordering
>>> by making svn_config_t iterate through sections in declaration order.
>>> This is done using a simple linked list because we can't remove
>>> sections but only add them.
>>
>> No no no no!
> What exactly did my change break?
>
> Right now, the svn_config interface returns the sections in random order.
> I changed the code such that this random order happens to be the same
> as the (static and dynamic) declaration order.
>
>> Changing the svn_config_t structure is not the right to do this. The correct
>> approach here is to parametrise the svn_config parser with a constructor
>> method, then create a new constructor for the authz parser which will then
>> get all entries in the file order. Please revert these svn_config changes.
> I reverted the changes for now as I agree that the new behaviour
> should be somewhat more explicit.
>
> However, since this is not about construction (the result would
> still be a hash)

You are making way too many assumptions about that. See my svn_config
parser parametrization in r1614213.

>  but about iteration, I see only two basic options:
>
> 1. Invent a new data structure that duplicates 80+% of svn_config_t
>   to provide the same functionality but different ordering through
>   its 'enumerate sections' interface. This basically would be r1614080
>   minus a few bits that we happen not to call in the authz context.
>
> 2. Reapply r1614080 but let 'enumerate sections' return data in
>   hash order as on /trunk. Bump the 'enumerate sections' API
>   to include a "use declaration order' option.

3. Feed the authz in-memory representation directly from the config
parser, without using an intermediate svn_config_t. This is what we
should do. The authz structure is inherently different from a config
structure, because it has different semantics and access patterns.

> IMO, number 2 is the better choice as it gives full control over the
> added functionality without adding significant maintenance costs.

You're already creating an in-memory structure that is significantly
different from svn_config_t, and does not require the presence of an
svn_config_t, nor does it have to enumerate anything, since it can be
constructed incrementally and in-order during parsing.

[...]

>> Are you saying that a wildcard path should match before a concrete path when
>> it happens to appear later in the authz file? Perish the thought. Exact
>> matches should always override fuzzy matches, anything else is not intuitive
>> at all and we'll receive a ton of spurious bug reports about how authz files
>> don't work.
> No, that is not what I'm saying. My rules are simple:

Let's please have the rules documented in the wiki, then we can comment
on them.

> * Select the path rule(s) that cover the largest section of the path.
>   This is what we do today and there can be only one such match
>   w/o the use of wildcards.
>
> * If there is more than one such rule, apply the one declared last.
>   This seems to be the easiest rule to follow and understand.

Well, I disagree, I'd expect an exact match to override a wildcard match.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco | Realising the impossibilities of Big Data
e. brane@wandisco.com

Re: svn commit: r1614080 - in /subversion/branches/authzperf: BRANCH-README subversion/libsvn_repos/authz.c subversion/libsvn_subr/config.c subversion/libsvn_subr/config_impl.h

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Jul 28, 2014 at 7:38 PM, Branko Čibej <br...@wandisco.com> wrote:
> On 28.07.2014 19:19, stefan2@apache.org wrote:
>>
>> Author: stefan2
>> Date: Mon Jul 28 17:19:47 2014
>> New Revision: 1614080
>>
>> URL: http://svn.apache.org/r1614080
>> Log:
>> On the authzperf branch: Implement the notion of path rule ordering
>> by making svn_config_t iterate through sections in declaration order.
>> This is done using a simple linked list because we can't remove
>> sections but only add them.
>
>
> No no no no!

What exactly did my change break?

Right now, the svn_config interface returns the sections in random order.
I changed the code such that this random order happens to be the same
as the (static and dynamic) declaration order.

> Changing the svn_config_t structure is not the right to do this. The correct
> approach here is to parametrise the svn_config parser with a constructor
> method, then create a new constructor for the authz parser which will then
> get all entries in the file order. Please revert these svn_config changes.

I reverted the changes for now as I agree that the new behaviour
should be somewhat more explicit.

However, since this is not about construction (the result would
still be a hash) but about iteration, I see only two basic options:

1. Invent a new data structure that duplicates 80+% of svn_config_t
  to provide the same functionality but different ordering through
  its 'enumerate sections' interface. This basically would be r1614080
  minus a few bits that we happen not to call in the authz context.

2. Reapply r1614080 but let 'enumerate sections' return data in
  hash order as on /trunk. Bump the 'enumerate sections' API
  to include a "use declaration order' option.

IMO, number 2 is the better choice as it gives full control over the
added functionality without adding significant maintenance costs.

>> Without support for wildcards or other patterns, the config struct
>> will only contain a single section for each path.  With wildcards,
>> there may be more than one.  All three of the follwing path rules
>> are equally applicable:
>>
>>   [/foo/*.doc]
>>   * = r
>>
>>   [/foo/bar.*]
>>   * = rw
>>
>>   [/foo/bar.doc]
>>   jrandom =
>>
>> To make conflicts managable, always pick the last path rule.  That
>> means users should specify general rules first, followed by exceptions
>> and finally (and optionally) critical rules that deny certain access,
>> potentially globally.
>
> Are you saying that a wildcard path should match before a concrete path when
> it happens to appear later in the authz file? Perish the thought. Exact
> matches should always override fuzzy matches, anything else is not intuitive
> at all and we'll receive a ton of spurious bug reports about how authz files
> don't work.

No, that is not what I'm saying. My rules are simple:

* Select the path rule(s) that cover the largest section of the path.
  This is what we do today and there can be only one such match
  w/o the use of wildcards.

* If there is more than one such rule, apply the one declared last.
  This seems to be the easiest rule to follow and understand.

Those rules suggest the following ordering:

* Declare old-style path rules in tree order. That makes it easy
  to understand what applies where. I expect most people today
  do it that way. Everything else leads to "surprising" rules, in
  particular as later declarations for the exact same path will
  be merged with previous ones and rules for the same user /
  alias / group be overwritten.

* Specify generic rules first. These define system or repo-wide
  default rules and shall be overruled by local exceptions.
  For example make "*/**/*.docx" r/o for everybody. Specific
  people may later get write access to individual docx files or
  all docx files in a given sub-tree etc. To the user, this looks
  like a natural outcome of the previous ordering.

* Sub-tree specific rules wildcard rules should be declared
  immediately before the respective old-style rules for that
  sub-tree. This is not a new rule but a mere result of the
  previous two.

* "Catch-all" rules shall be placed last. These should be seen
  as a "last resort" to e.g. deny access to certain data irrespective
  of any previous declaration. E.g. "*/**/mysecret.docx" will catch
  all copies everywhere as long as they share the same name.
  For the same reason, this must also be the last resort because
  it may unintentionally break things for other people.

> Precedence ordering is fine between identical exact matches or between
> equivalent wildcard matches for the same path, but not between these two
> categories.

Why? The only difference is that you can't do the "catch all"
anymore while not providing any tangible benefit: Ordering
must still be evaluated and if people don't sort their path rules
by path, they will still get into trouble due to losing overview.

-- Stefan^2.