You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2012/01/01 03:21:13 UTC

Re: Problems with the documentation of Subversion dump format

Mark Mielke wrote on Sat, Dec 31, 2011 at 01:00:12 -0500:
> On 12/30/2011 09:35 PM, Daniel Shahaf wrote:
> >Mark Mielke wrote on Fri, Dec 30, 2011 at 20:22:50 -0500:
> >
> >>I think you are not understanding my concern. If svn:author is only
> >>ever displayed to the user - then "authenticated username" may not
> >>be a desirable form to use. For teams of 10 people, sure you can
> >>recognize the uid of everybody in the team. But what about teams of
> >>100, or teams of 1000?
> >AuthLDAPRemoteUserAttribute cn
> >
> >Then you can do
> >
> >% svn commit --username "Daniel Shahaf"
> >
> >and the logs will show
> >
> >------------------------------------------------------------------------
> >r1 | Daniel Shahaf | strftime(...) | 1 line
> >------------------------------------------------------------------------
> 
> We use this for a few services - but note how now instead of losing
> the full name, it now loses the unique identifier. In a company of
> 1,000+ people, there is a good chance for overlap of "cn". There
> might be only one Mark Mielke, but other names such as John Sullivan
> there could be many. The "cn" is not a unique identifier and cannot
> be used to key off. It is for display purposes only.
> 

Another idea is to change the revprop's value in the pre-commit or
post-commit hook:
    ..
    author=`svnlook propget --revprop -t $TXN svn:author`
    svnadmin setrevprop -t $TXN svn:author "`getent passwd $author | cut -d: -f5 | cut -d, -f1` <$a...@localdomain>"
    ..
and then people still authenticate with their uid's, but all existing
tools will automatically show DVCS-style name+address author names.

And if _that_ 's not good enough... what Stefan said: someone needs to
sit down, define a problem, design a solution, and push it through.
Perhaps it's as simple as defining a few new revprops?

> All of this falls under the banner of thinking small. Small teams.
> Few requirements. Most products are like this. Sorry. I know you are
> just trying to help. :-)
> 
> -- 
> Mark Mielke<ma...@mielke.cc>
>

making progress in a meritocracy (was: Re: format of svn:author)

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Jan 03, 2012 at 12:53:39PM -0500, Mark Mielke wrote:
> 1) Require a means to reliably determine the AUTHOR of a changeset.
> Reliable here means machine consumable in a standard format which
> all tools are aware of because the standard is documented.
> 
> 2) Require all native output from the tool (such as "svn log")
> designed to be read by humans to include a convenient and easily
> readable format.
> 
> 3) Provide a standard convention or protocol for 3rd party tools to
> reliably determine either the unique identifier or the humanly
> readable expansion from Subversion. Either provide additional
> information in the commit itself, or provide a mechanism to either
> lookup the information, or a mechanism to lookup how to get the
> information.

That's a good summary of your requirements. It sums up what you've
been explaining so far.

> >You can keep criticising us all you want, it won't change a thing
> >if you don't also explain in detail what needs to be changed.
> >We cannot read your mind to obtain a functional specification.
> 
> I know. That's why I wanted to go away and think before coming back.

Then do that. Nobody's stopping you.
I think this is exactly what you'd need to do next.

> I'm probably wrong for being frustrated, and I'm probably wrong for
> how am I approaching this. I should probably just sit back and watch
> again for a while. I'm sure you haven't appreciated my criticism,

I suspect the process seems hard to you because this community works
as a meritocracy and you aren't use to working in this way.
Maybe I'm wrong and you already understand most of what I'm about to
say but I'll write it down anyway in case it helps.

Other projects work differently. It might seem easier for things
to happen quickly and arguments to be decided when there's one benevolent
dictator sitting on top making all the tough decisions, like in the git
project, or with a company where upper management is responsible for
enforcing the direction the project is going (clearcase, perforce).

The Subversion project doesn't use either of these approaches.

I suspect that you're frustrated with the style of development that
happens in a community driven by consensus, because you want to see
stuff get done fast, and are afraid of investing time and effort which
might turn out to be in vain in case the community as a whole doesn't
accept the results of your work.

On top of that, judging reactions from a group of people you've never
physically met and trying to gauge the general opinion that's forming
during discussion within such a group is very hard. It requires a bit
of luck as well as confidence and communication skills.

But remember that just because somebody is objecting to aspects of your
ideas doesn't mean that your ideas won't eventually be accepted.
The purpose of the process is to filter out the good ideas from the bad
ones, and transform good ideas into great ones. That requires time, a thick
skin, and the ability to seriously question ones own motives and ideas
for the merit they will bring to the entire community comprised of users
and developers.

Stuff gets done when one or more people who want to drive change sit
down and do the work. When they get stuck at any point in the development
process (design, implementation, testing...) they consult this list for help.
This also allows them to keep an eye on the community's reaction to
their work. There is no requirement for drivers of change to already be
known members of the development community -- the community simply grows
when this happens. The only requirement is that, eventually, everyone is
happy with the changes being made. In extreme cases, there may be voting.
But in this project voting only happened twice within more than 10 years,
and one of those instances was about whitespace formatting in the code so,
yes, we tend to have long-winded discussions :)

Whenever I make non-trivial changes to Subversion I make two assumptions
that you don't seem to be happy to make. I assume that it will take me
forever to get it done, and I assume my ideas and my code are initially
mostly wrong. I don't start out assuming I know what's right.
I assume I'm wrong, and then slowly try to work towards being less wrong.
I rely on the community to help me be less wrong and move towards being
right. That's one way of eventually getting things done in a meritocracy.
This approach prevents frustration about anyone but myself. If I fail,
I failed because of my own fault, so I need to improve and try again.
I don't fail because I was right and everyone else didn't agree with me.

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/03/2012 12:27 PM, Stefan Sperling wrote:
> On Tue, Jan 03, 2012 at 12:11:20PM -0500, Mark Mielke wrote:
>> Other solutions provide these capabilities out of box.
> Could you point out which solutions exist so people can take a look at them?

GIT, ClearCase, and Perforce are the ones I use.

GIT has name, email, signing keys, and others but is not widely used in 
our organization at this time. The original start of this thread was a 
person talking about GIT and mapping attributes from GIT to Subversion 
and how Subversion didn't seem to have the right attributes to do this.

ClearCase, as mentioned, stores both a "current owner" and a "creation 
event". The "current owner" is in native format - the UNIX uid#/gid#, 
and mapped back to a UNIX username/groupname on demand. The current 
owner is more useful for access controls. The "creation event" contains 
a string which includes the username, domainname (NIS), and fullname. We 
have a customization against ClearCase that adds in submission 
authentication to support shared UNIX accounts that will tie a unique 
identifier to the submission.

Perforce has an account management system of its own. Which isn't 
necessarily an end goal on its own, but when tied with a synchronization 
system such that Perforce accounts match upstream accounts, it does work 
fairly well.

But these are just examples of what other people have done. The 
requirements are pretty straight forward:

1) Require a means to reliably determine the AUTHOR of a changeset. 
Reliable here means machine consumable in a standard format which all 
tools are aware of because the standard is documented.

2) Require all native output from the tool (such as "svn log") designed 
to be read by humans to include a convenient and easily readable format.

3) Provide a standard convention or protocol for 3rd party tools to 
reliably determine either the unique identifier or the humanly readable 
expansion from Subversion. Either provide additional information in the 
commit itself, or provide a mechanism to either lookup the information, 
or a mechanism to lookup how to get the information.

GIT uses attributes. ClearCase uses attributes. Perforce uses lookups.

> You can keep criticising us all you want, it won't change a thing
> if you don't also explain in detail what needs to be changed.
> We cannot read your mind to obtain a functional specification.

I know. That's why I wanted to go away and think before coming back.

I am trying to determine where we would like to go next, and although 
Subversion has been a favourite of mine since 2003 or so, every time I 
try to seriously consider it, it seriously disappoints me. Our 
organization will put time and money into the direction we choose - but 
I can't responsibly select a tool which does not meet our requirements 
no matter what my fancy.

I've been waiting a long time for Subversion to come of age. I've 
monitored throughout. From time to time, I've tried to help. It is 
disenchanting when I see new solutions come out of nowhere (i.e. GIT) 
that already meet requirements out of box with authors and contributors 
that already understand the problems and the solutions which seem to be 
extremely difficult to implement in Subversion. Everything - even things 
as simple as this problem - seem like an incredibly chore in Subversion 
land.

I'm probably wrong for being frustrated, and I'm probably wrong for how 
am I approaching this. I should probably just sit back and watch again 
for a while. I'm sure you haven't appreciated my criticism, and for many 
of you it is probably not deserved. You have your itches to scratch, and 
you are itching them. Why should you care about my itches? You are being 
nice to me to bother to consider my itches at all. :-)

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Jan 03, 2012 at 12:11:20PM -0500, Mark Mielke wrote:
> Other solutions provide these capabilities out of box.

Could you point out which solutions exist so people can take a look at them?

You can keep criticising us all you want, it won't change a thing
if you don't also explain in detail what needs to be changed.
We cannot read your mind to obtain a functional specification.

Re: format of svn:author

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

kmradke@rockwellcollins.com wrote on Thu, Jan 05, 2012 at 14:03:37 -0600:
> Mark Mielke <ma...@mark.mielke.cc> wrote on 01/05/2012 12:36:10 PM:
> > On 01/05/2012 12:34 PM, Branko Čibej wrote:
> > > On 05.01.2012 18:25, Mark Mielke wrote:
> > >> On 01/05/2012 12:04 PM, Branko Čibej wrote:
> > >>> Ha, but svn:author currently fills that role. So why add another
> > >>> property?
> > >> If svn:author is defined as the primary key and also the
> > >> authentication key, it does seem simpler and more compatible with
> > >> existing tool assumptions and existing documentation.
> > > svn:author is basically "the username". Of course, many installations,
> > > especially those that use client certificates, will put other things
> > > there; an example I've ofthen seen is CN (Email), which usually is not
> > > what you'd really want since neither is unique or persistent.
> > 
> > Yep. Microsoft AD likes to use user's name in the DN (Distinguished 
> > Name), or at least that is how many people seem to configure it. Yuck. 
> > In any case, I would say it's the responsibility of the organization to 
> > decide what their unique identifier is. If they choose a bad one - 
> > that's on them. :-)
> > 
> > For many systems, username is pretty good.
> 
> Coming late to the discussion, but assuming you are using apache,
> one could use an existing (or custom) auth module in apache
> to mangle/rewrite/map the provided user id that subversion
> uses to something that may be more useful.  Subversion will
> then happily store whatever is provided in the author field.
> This would purely be a server side configuration.

You can do that in the pre-commit hook too.

Re: format of svn:author

Posted by Julian Foad <ju...@btopenworld.com>.

Mark Mielke wrote:

> Stefan Fuhrmann wrote:
>>  On 04.01.2012 19:42, Julian Foad wrote:
>>>     The extended author fields are delivered through revision 
>>> properties [that] are readable but not writable by clients.
>> 
>>  Maybe, I missed something in your post but I want
>>  to stress that is very important to be able to change
>>  that information later on.
> 
> The idea that Julian put forward is that these would be calculated 
> fields. Never stored. [...]

That's right.  Thanks for clarifying, Mark.

By the way, I'm happy to continue giving feedback, reviews and suggestions on this feature.  I can't at this stage promise to do more than that.

- Julian

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/08/2012 09:55 PM, Stefan Fuhrmann wrote:
> On 04.01.2012 19:42, Julian Foad wrote:
>>    The extended author fields are delivered through revision 
>> properties.  The values are UTF-8 text.  These revision properties 
>> are readable but not writable by clients.
>
> Maybe, I missed something in your post but I want
> to stress that is very important to be able to change
> that information later on.

The idea that Julian put forward is that these would be calculated 
fields. Never stored. Always "fresh" (beyond allowance for server side 
caching configured according to requirements).

I think this is perfectly adequate for all use cases I can think of. 
However, it does mean that historical users need to be maintained within 
the server-side database that is used to calculate the field values.

Any fields which are truly per-commit and not per-user, are not 
svn:author fields. They are something else. Per-user fields can stay as 
"latest" or if there is ever a scenario where a user must have an "old" 
identity and a "new" identity, this could be accomplished by having the 
user change their unique identifier allowing both old- and new- mappings 
to remain valid.

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Stefan Fuhrmann <eq...@web.de>.

On 04.01.2012 19:42, Julian Foad wrote:
>    The extended author fields are delivered through revision properties.  The values are UTF-8 text.  These revision properties are readable but not writable by clients.

Maybe, I missed something in your post but I want
to stress that is very important to be able to change
that information later on.

One use-case is a repository move, another are
changes to the user accounts themselves (had
that more than once in the past). Because typical
pre-revprop-change scripts will compare the
current user with the rev's creator before it accepts
log message changes, an update of old user info
seems necessary.

Once we are at it, a server-side tool for efficient batch
user changes would be nice (millions of revprop
changes distributed over multiple repositories).
>    Three property names are initially designated  as "well known":
>
>      * prop name: "svn:author:authn-id"
>        purpose: authenticated user id
>        format: as used by Subversion's authentication (the default
>          value of svn:author)
>
>      * prop name: "svn:author:display-name"
>        purpose: display name
>        format: a single line (no line breaks), e.g. person's full
>          name or shortened name or nickname
>
>      * prop name: "svn:author:email"
>        purpose: email address
>        format: [TO BE SPECIFIED HERE]
>
A general observation: It seems impractical to store
anything but the user / account information. The
strategy would be to hope that one of the 3 aspects
of a given account will not change over time.

Modelling a real person with such aspects like having
different accounts at the same time (happens in
sufficiently large companies) seems to be out of scope
entirely. It would open a whole new can of worms.

The deeper problem behind all this is that we record
the history of ones user management and not only
apply the the current, consistent account settings.
Introducing ACLs within SVN at some point in the
future will probably make that issue much more obvious.

-- Stefan^2.

Re: format of svn:author

Posted by Branko Čibej <br...@apache.org>.

On 04.01.2012 19:42, Julian Foad wrote:
> DESIGN
>
>   The extended author fields are delivered through revision properties.  The values are UTF-8 text.  These revision properties are readable but not writable by clients.
>
>   Three property names are initially designated  as "well known":
>
>     * prop name: "svn:author:authn-id"
>       purpose: authenticated user id
>       format: as used by Subversion's authentication (the default
>         value of svn:author)
>
>     * prop name: "svn:author:display-name"
>       purpose: display name
>       format: a single line (no line breaks), e.g. person's full
>         name or shortened name or nickname
>
>     * prop name: "svn:author:email"
>       purpose: email address
>       format: [TO BE SPECIFIED HERE]

At the /very/ least you have to define which of the properties must have
values that are unique within the given repository; what is the primary
key; and how to select the property to be shown in log, blame, and the like.

-- Brane

Re: format of svn:author

Posted by km...@rockwellcollins.com.

Mark Mielke <ma...@mark.mielke.cc> wrote on 01/05/2012 12:36:10 PM:
> On 01/05/2012 12:34 PM, Branko Čibej wrote:
> > On 05.01.2012 18:25, Mark Mielke wrote:
> >> On 01/05/2012 12:04 PM, Branko Čibej wrote:
> >>> Ha, but svn:author currently fills that role. So why add another
> >>> property?
> >> If svn:author is defined as the primary key and also the
> >> authentication key, it does seem simpler and more compatible with
> >> existing tool assumptions and existing documentation.
> > svn:author is basically "the username". Of course, many installations,
> > especially those that use client certificates, will put other things
> > there; an example I've ofthen seen is CN (Email), which usually is not
> > what you'd really want since neither is unique or persistent.
> 
> Yep. Microsoft AD likes to use user's name in the DN (Distinguished 
> Name), or at least that is how many people seem to configure it. Yuck. 
> In any case, I would say it's the responsibility of the organization to 
> decide what their unique identifier is. If they choose a bad one - 
> that's on them. :-)
> 
> For many systems, username is pretty good.

Coming late to the discussion, but assuming you are using apache,
one could use an existing (or custom) auth module in apache
to mangle/rewrite/map the provided user id that subversion
uses to something that may be more useful.  Subversion will
then happily store whatever is provided in the author field.
This would purely be a server side configuration.  Some auth
modules already do some manipulation to what the user provides,
such as removing the windows domain info or everything
after @.

I'd actually hate to be capturing additional information such
as email address for a specific user since that could change
and is just duplicating what is already available via other
means.  If and when I want/need that info I'd much prefer to
look it up in a directory to get the current value instead of
relying on something attached to an old transaction.

As mentioned, choosing that unique key is important, and
in an enterprise it is essential to ensure all tools
are sharing that same identifier...

Kevin R.

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/05/2012 12:34 PM, Branko Čibej wrote:
> On 05.01.2012 18:25, Mark Mielke wrote:
>> On 01/05/2012 12:04 PM, Branko Čibej wrote:
>>> Ha, but svn:author currently fills that role. So why add another
>>> property?
>> If svn:author is defined as the primary key and also the
>> authentication key, it does seem simpler and more compatible with
>> existing tool assumptions and existing documentation.
> svn:author is basically "the username". Of course, many installations,
> especially those that use client certificates, will put other things
> there; an example I've ofthen seen is CN (Email), which usually is not
> what you'd really want since neither is unique or persistent.

Yep. Microsoft AD likes to use user's name in the DN (Distinguished 
Name), or at least that is how many people seem to configure it. Yuck. 
In any case, I would say it's the responsibility of the organization to 
decide what their unique identifier is. If they choose a bad one - 
that's on them. :-)

For many systems, username is pretty good.

>> There is of course some expectations around transition - such as we'd
>> only want to do the conversion to the new model once some key tools
>> supported it - "svn log", TortoiseSVN, Subclipse, and Crucible/FishEye
>> will begin working right away as the content of svn:author is now
>> recognizable as Crucible/FishEye user identifiers without the need to
>> define committer mappings and the Subversion metadata could be
>> re-indexed. I think it wouldn't be a problem beyond scheduling.
> Well, given that revision properties aren't indexed at all ... my use of
> the term "primary key" was a bit overdone, since it's really just a
> convention, not a requirement. But If we extend the way we identify
> authors, we'd better do something about enforcing these requirements, too.

Sorry for the confusion. I take it Crucible/FishEye is not widely used 
around here? In any case, FishEye is a tool like ViewVC that scans 
repositories such as Subversion repositories and creates an index to 
allow users to perform lookups from a web view. All commits owned by a 
user. Files that contain a particular text string. Etc. So when I 
mention re-index above, I mean asking Crucible/FishEye to dump its index 
and to re-scan the Subversion repositories. This would allow it to pick 
up the new properties and reset its statistics.

In terms of requirements - I don't think Subversion needs to enforce the 
requirements. It needs only make them known (which is perhaps what you 
are saying). The only true requirement is that the unique identifier can 
be reliably used to lookup additional data. The additional data may or 
may not be unique keys - but this would be up to the upstream data 
source to define. Display name would not generally be unique. Email 
might or might not be unique - there are scenarios for both. For 
example, some users may have a secondary account that they use for 
another purpose, but they might have the same "contact email address". I 
think the requirement is that svn:author be usable as a primary key, and 
that any support for pluggable modules to provide additional data will 
only be given this primary key to determine what additional data to return.

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Branko Čibej <br...@xbc.nu>.

On 05.01.2012 18:25, Mark Mielke wrote:
> On 01/05/2012 12:04 PM, Branko Čibej wrote:
>> On 05.01.2012 11:32, Julian Foad wrote:
>>> Branko wrote:
>>>> [...] you have to define which of the properties must  have values
>>>> that are unique within the given repository; what is the primary key;
>>> OK, let's say:
>>>
>>> The "svn:author:authn-id" value is the primary key, and so is unique
>>> within a [Subversion repository | Subversion server ?].
>> Ha, but svn:author currently fills that role. So why add another
>> property?
>
> If svn:author is defined as the primary key and also the
> authentication key, it does seem simpler and more compatible with
> existing tool assumptions and existing documentation.

svn:author is basically "the username". Of course, many installations,
especially those that use client certificates, will put other things
there; an example I've ofthen seen is CN (Email), which usually is not
what you'd really want since neither is unique or persistent.

>
>>>    The administrator must configure the Subversion server to perform
>>> a mapping from "svn:author" value to the primary key, typically the
>>> trivial "x ->  x" mapping but another example could extract "1234"
>>> from "John Doe (1234)".
>> That seems less than optimal. Your specification changes the meaning of
>> svn:author. Do you intend this to cater to the installations that are
>> already abusing and overloading svn:author?
>
> As one of these abusers, I don't mind re-writing history to fix this
> problem. I don't have a need for catering here. As per the previous
> email around the original problem of importing content from GIT, I
> don't mind either of:
>
> 1) Prevent users from setting svn:author:* properties, but if they
> happen to exist - to serve them instead of doing a lookup. In this
> case, I would migrate historical data using revprops and make
> svn:author become the primary key / unique identifier again.
>
> 2) Migrate users that do not exist into a database of removed users
> and have the data available for lookup resolution.
>
> Either would work fine.
>
> There is of course some expectations around transition - such as we'd
> only want to do the conversion to the new model once some key tools
> supported it - "svn log", TortoiseSVN, Subclipse, and Crucible/FishEye
> will begin working right away as the content of svn:author is now
> recognizable as Crucible/FishEye user identifiers without the need to
> define committer mappings and the Subversion metadata could be
> re-indexed. I think it wouldn't be a problem beyond scheduling.
>

Well, given that revision properties aren't indexed at all ... my use of
the term "primary key" was a bit overdone, since it's really just a
convention, not a requirement. But If we extend the way we identify
authors, we'd better do something about enforcing these requirements, too.

-- Brane

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/05/2012 12:04 PM, Branko Čibej wrote:
> On 05.01.2012 11:32, Julian Foad wrote:
>> Branko wrote:
>>> [...] you have to define which of the properties must  have values
>>> that are unique within the given repository; what is the primary key;
>> OK, let's say:
>>
>> The "svn:author:authn-id" value is the primary key, and so is unique within a [Subversion repository | Subversion server ?].
> Ha, but svn:author currently fills that role. So why add another property?

If svn:author is defined as the primary key and also the authentication 
key, it does seem simpler and more compatible with existing tool 
assumptions and existing documentation.

>>    The administrator must configure the Subversion server to perform a mapping from "svn:author" value to the primary key, typically the trivial "x ->  x" mapping but another example could extract "1234" from "John Doe (1234)".
> That seems less than optimal. Your specification changes the meaning of
> svn:author. Do you intend this to cater to the installations that are
> already abusing and overloading svn:author?

As one of these abusers, I don't mind re-writing history to fix this 
problem. I don't have a need for catering here. As per the previous 
email around the original problem of importing content from GIT, I don't 
mind either of:

1) Prevent users from setting svn:author:* properties, but if they 
happen to exist - to serve them instead of doing a lookup. In this case, 
I would migrate historical data using revprops and make svn:author 
become the primary key / unique identifier again.

2) Migrate users that do not exist into a database of removed users and 
have the data available for lookup resolution.

Either would work fine.

There is of course some expectations around transition - such as we'd 
only want to do the conversion to the new model once some key tools 
supported it - "svn log", TortoiseSVN, Subclipse, and Crucible/FishEye 
will begin working right away as the content of svn:author is now 
recognizable as Crucible/FishEye user identifiers without the need to 
define committer mappings and the Subversion metadata could be 
re-indexed. I think it wouldn't be a problem beyond scheduling.

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Branko Čibej <br...@apache.org>.

On 05.01.2012 11:32, Julian Foad wrote:
> Branko wrote:
>> [...] you have to define which of the properties must  have values
>> that are unique within the given repository; what is the primary key;
> OK, let's say:
>
> The "svn:author:authn-id" value is the primary key, and so is unique within a [Subversion repository | Subversion server ?].

Ha, but svn:author currently fills that role. So why add another property?

>   The administrator must configure the Subversion server to perform a mapping from "svn:author" value to the primary key, typically the trivial "x -> x" mapping but another example could extract "1234" from "John Doe (1234)".

That seems less than optimal. Your specification changes the meaning of
svn:author. Do you intend this to cater to the installations that are
already abusing and overloading svn:author?

-- Brane

Re: format of svn:author

Posted by Julian Foad <ju...@btopenworld.com>.

Branko wrote:
> [...] you have to define which of the properties must  have values
> that are unique within the given repository; what is the primary key;

OK, let's say:

The "svn:author:authn-id" value is the primary key, and so is unique within a [Subversion repository | Subversion server ?].  The administrator must configure the Subversion server to perform a mapping from "svn:author" value to the primary key, typically the trivial "x -> x" mapping but another example could extract "1234" from "John Doe (1234)".

This specification does not require the values of any other extended author field to be unique.

The administrator may guarantee locally that a particular extended author field is unique in some scope.  For example, a build-bot can update an issue tracker, and so needs to know the issue tracker user id for the author of a particular Subversion revision.  The administrator configures Subversion to provide that id in the "author:tracker-uid" revision property.  The issue tracker user id needs to be unique among all users of the tracker, of course, and so the administrator ensures that is true and then tells the build-bot which of Subversion's extended author fields holds the issue tracker user id: that is, "author:tracker-uid".  Note that its values are unique among all users of that issue tracker, not necessarily the same as being unique across all users of a particular Subversion repository or all Subversion repositories.

> and how to select the property to be shown in log, blame, and the like.

That is briefly stated in the "CLIENT DESIGN" section -- basically, client-side configuration.  (Client-side configuration is of course not ideal, but is a stepping stone to server-dictated configuration which is the subject of a separate and concurrent design effort.)

Mark Mielke wrote:
> On 01/04/2012 01:42 PM, Julian Foad wrote:
>>  A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION
>> 
>>  USE CASES
>> 
>>  1.[This one I am aware of.]
>> 
>>     A large company has authenticated user ids that are numeric.  That 
>> means the "log" and "blame" information shown by most Subversion clients 
>> is not easy to understand.  Therefore they use a (post-commit?) hook to 
>> change  the svn:author property to a more friendly string, which (mostly) 
>> solves the display issue.  However, it causes other problems.  [What 
>> problems?]
> 
> Problems:
> 
> 1) The unique identifier is no longer a direct match against external identity 
> management systems. [...]
> 
> 2) Users may end up with multiple unique identifiers over time [...]

So, basically putting display information in svn:author may not cause a problem in that scenario alone but will cause a problem if and when other tools want the value to be a unique id.

>>  2. [This one is a guess.]
>> 
>>     The leader of a small development team sharing a Subversion repository 
>> with other teams wants to set up a build slave that will send an email [...]
> 
> Much of the above can be accomplished today as it is server side [...].
> To extend the above to a situation that makes it more difficult -

Actually I meant UC2 to be a client-side problem like you're describing, so we're both talking about the same thing.

[...]

To everything else you said: yes, sounds good.

- Julian

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/04/2012 01:42 PM, Julian Foad wrote:
> A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION
>
> USE CASES
>
> 1.[This one I am aware of.]
>
>    A large company has authenticated user ids that are numeric.  That means the "log" and "blame" information shown by most Subversion clients is not easy to understand.  Therefore they use a (post-commit?) hook to change
> the svn:author property to a more friendly string, which (mostly) solves the display issue.  However, it causes other problems.  [What problems?]

Problems:

1) The unique identifier is no longer a direct match against external 
identity management systems. For example, if svn:author is "Mark Mielke 
(1234567)" and LDAP stores employeeNumber="124567" and cn="Mark Mielke", 
very few tools support the ability to pattern match svn:author to pull 
out character groups and to then lookup in an external identity 
management system using the character group. I can't think of a single 
tool that provides this capability out of box. In these tools, if I am 
logged in as "1234567" it cannot know which commits are mine, because 
"1234567" is not equal to "Mark Mielke (1234567)".

2) Users may end up with multiple unique identifiers over time due to 
the unique identifier portion being combined with a more approximate 
(and therefore inaccurate) humanly readable form. Display name or email 
may change over time, and the ability to uniquely identify the author 
becomes more complex as the mapping must include every instance 
discovered at commit time. Some of this is subject to which identifier 
is selected as the unique identifier - but let us say that a system such 
as Forge is used and the identifier is some sort of username such as 
"twoleftfeet". The email might start as "joe@doe.com", but end up as 
"jdoe@acme.com". Any report around commits such as commits made per 
user, or for a particular user - would either end up with split history 
(treating the history as belong to two or more users) or the reporting 
algorithm would need to allow for each instance to be recognized as the 
same user. Similarly - names can change. Perhaps the person gets married 
or divorced. "Mary Clairmont (prettygirl99)" becomes "Mary Dupont 
(prettygirl99)".

For both of these problems, one could argue that the reporting tool 
could take the complex value into account. It could parse out the unique 
identifier. This presumes that you have access to the source code and 
the ability to make the changes which (license restrictions, resource 
requirements, ...). This could be true of one or two tools - but 
certainly not all tools that support Subversion as this is a fairly 
massive list. This is particularly problematic if there is no standard 
as it means that my work in my company against my convention is not 
easily shareable with your work in your company against your convention.

> 2. [This one is a guess.]
>
>    The leader of a small development team sharing a Subversion repository with other teams wants to set up a build slave that will send an email to the users who committed revisions leading to a build failure.  The machine can see the Subversion user id but how can it get the user's email address?  The team leader could ask the repository administrator to add a post-commit hook that adds an email address to a revision property after every commit, but that
>
>      * requires involving the server admin;
>      * won't get updated when the user changes their email address;
>      * won't work for testing old revisions that were already committed before that time;
>      * won't work if the build slave software needs to read a list of all user id->email mappings at once.

Much of the above can be accomplished today as it is server side and 
server side gives more flexibility as it can be customized in one place. 
To extend the above to a situation that makes it more difficult -

There are a number of tools such as Crucible/FishEye that will monitor a 
Subversion repository for changes, and then take action based on the 
commit log. So the actions are being performed by "clients" and not by 
the server itself. If the "client" sees a Subversion commit for 
"1234567" or "jdoe", how does it know who is the authority on what email 
is associated with this account? With svn:author being the unique 
identifier - this is not that difficult in many cases as it is a simple 
LDAP query away. However, if we mix 1) and 2) together, we get the same 
problem. Subversion users need to see full name in "svn log" output, so 
they update svn:author to include the full name like "Mark Mielke 
(1234567)", and then Crucible/FishEye sees the commit as authored by 
"Mark Mielke (1234567)" and how does it look up this value in LDAP to 
find the email?

> 3. [This one is a guess.]
>
>    An administrator wants to integrate Subversion with an issue tracker.  Users have different user ids on the two tools.  The admin wants to configure the tracker so that it automatically annotates an already committed Subversion revision with some status information.  How can the tracker know with what user id to contact the Subversion server?

We don't have this requirement, but I believe this requirement can be 
seen in situations such as:

1) Issue tracker, such as JIRA, is externally visible. Users and 
customers can sign up to the external site directly. Identity management 
system is stored in JIRA as these are essentially "external users".

2) Source management system, such as Subversion, is internal only. Users 
and customers may be able to access the content read-only. Identity 
management system is stored in Microsoft Active Directory or OpenLDAP 
and are assigned according to corporate policies.

In this scenario, there are a lot of requirements to be able to map back 
and forth between the internal and external ID. The binding might be 
stored as an LDAP attribute such as "jirauser".

I don't know if this particular problem is for Subversion to solve or 
not - but if the Subversion solution was general enough to support 
configuration that might allow this information to be exposed in a 
general way, somebody someday would probably be thankful. I wouldn't go 
out of my way to specifically solve this requirement, though. Just, if 
it comes for free with a good solution to the other requirements, don't 
block it. :-)

> The rest of the proposal addresses UC1 and part of UC2 but not UC3.  (UC3 looks like it needs some totally separate solution, outside of Subversion.)

Agree.

> REQUIREMENTS
>
>    A Subversion client (of any kind so designed) shall be able to read extended information about the author of a revision.  This information shall consist of a (possibly empty) set of fields.  The set of possible extended author fields shallinclude at least:
>
>      * authenticated user id
>
>      * display name
>      * email address
>
>    It shall be possible to add other fields on the server side (by software upgrade and/or by configuration), and for a client (of any kind so designed) to discover and read these fields without any software upgrade on the client side.
>    The svn:author property shall continue to exist.  When not using the extended author fields, the svn:author property must continue to operate as before.  When using the extended author fields, the design may restrict the use of the svn:author field.  Example: the design could require that if extended author fields are to be usable then the svn:author field always holds the authenticated user id and must always be present and non-empty.

This is a smart compromise. Forwards and backwards compatibility. 
Interface restrictions to guarantee extensibility.

In terms of some actual implementation of this, the documentation should 
probably recommend that clients make use of the display name and email 
address as standard fields, and only optionally be aware of 
repository-specific additional attributes. Otherwise it gets pretty 
messy in that you'd have to provide a means to make clients aware of 
what is being published and how and where they should be displayed. I 
would start with just the two and specific recommendations. For example, 
annotated source code on a web page might show the display name, but 
when one mouses over the display name or clicks on a gear icon to the 
side, access to additional details might be displayed. The display name 
might be linked such that a mouse click on the display name pulls up the 
user profile, but the user profile would be identifier by the unique 
identifier. Enough information to recommend a consistent and useful 
interface, but not enough to be restrictive.

You cover some of this below:

>    A client shall access the extended author fields through the Subversion server, through the existing client-server protocols, possibly with protocol extensions.  Any protocol extensions shall be backward compatible in that an old server with a new client or an old client with a new server shall (without user intervention) use the old 'svn:author' property.
>
>
>    The fields that are available from a particular server or repository are determined by the administrator.  For any particular committed revision, the server may provide any or all or none of the extended author fields.  A client cannot rely on any particular field being available except to the extent that the administrator gives such an assurance.  Example: if the client requests the authenticated user id and email address for a revision whose author has no email address recorded,the server shall provide the authenticated user id but no email address.  If the server is temporarily unable to look up any information about a user, the server should respond with no extended author fieldsinstead of waiting.
>
>
>    The extended author fields are dynamic in the sense that the server need not always return the same values for the same committed revision.  For example,a client might repeat exactly the same request for information about revision 1234 twice in quick succession, and the server might provide the email address as "a@b.c" the first time and "dd@ee.ff" the second time.  Even the "authenticated user id" field could change.
>
>
> DESIGN
>
>    The extended author fields are delivered through revision properties.  The values are UTF-8 text.  These revision properties are readable but not writable by clients.
>
>    Three property names are initially designated  as "well known":
>
>      * prop name: "svn:author:authn-id"
>        purpose: authenticated user id
>        format: as used by Subversion's authentication (the default
>          value of svn:author)
>
>      * prop name: "svn:author:display-name"
>        purpose: display name
>        format: a single line (no line breaks), e.g. person's full
>          name or shortened name or nickname
>
>      * prop name: "svn:author:email"
>        purpose: email address
>        format: [TO BE SPECIFIED HERE]
>
>
>    Other property names in this name space beginning with "svn:author:" can be designated as "well known" in the future, by an official announcement from the Subversion project.
>
>    An administrator can configure other extended author fields to use property names that are not in the "svn:" name space.  Example: an administrator could configure the property name "author:pgp-sig" to hold the author's PGP signature.

Excellent.

> SERVER DESIGN
>    Any time the server is about to send a set of revision properties to
> the client, the server looks up the extended author fields and adds
> corresponding properties to the set of revision properties that it
> reports to the client.  These property values override any values The server looks up the extended author fieldsthrough some mechanism not defined here,using the value of the"svn:author" property as a key.  The server may cache the results, provided that there is a way for the administrator to make the server use updated information.

The cache can be a typical cache. The information that might be returned 
should generally be semi-persistent and not changing from minute to 
minute. As long as it takes effect within a reason time period 
(configurable along with the configuration on how to obtain the extended 
attribute information in the first place?) there is no problem.

>    If the client attempts to set any revision property in the "svn:author:" name space, the server shall report an error to the client.  This applies even if the property value matches the value that was last read from the server or is currently known to the server, and even if the
> specific property name is not known to the server.  If the client attempts to set any revision property that is not in the "svn:author:" name space but might be configured as an extended author field, the server records that revision property in the normal way.  If a revision property (of any name) has a stored value and the extended author field look-up also provides a value for the same property name, the latter takes priority.
>
>
>    The extended author fields [are | are not] available to the following hook scripts: pre-commit, ...

Although not necessary for the fields to be available to the hook 
scripts - it would be extremely convenient for them to be so. We have 
hooks that perform LDAP lookups - but each hook has to have intimate 
knowledge of the environment it is contained in making them difficult to 
be published - for example, as an open source component that others 
could re-use. They may have hard coded LDAP bind passwords for example, 
making them insecure to publish. It would be extremely nice if any open 
source component writer could make use of these fields without having to 
care where the values come from, and the configuration for where the 
values come from could be centralized in one place - the Subversion server.

> CLIENT DESIGN
>
>    Just an example.  The "svn log" and "svn blame" commands could request the revision property named "svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of "svn:author".  Further, a client-side configuration option could specify which property name should be used for these display purposes, so for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of "svn:author:display-name".

This would be great. I think many people like to see the format that GIT 
uses: Display Name <em...@domain>. This should be an option.

> FURTHER SCOPE
>
>    Does a client need to be able to look up the information in other ways, such as starting from svn:author rather than a revision number, or starting from an extended author field?
>

I'm not clear on how "svn blame" is implemented. Presuming that it knows 
what commit each line belongs to and that these are already being 
queried (i.e. the implementation won't have to significantly change as a 
result of this proposal), it is satisfactory for it to access the 
information from the revision properties. I don't at the moment see a 
requirement to be able to query a list of known users, or information 
for a particular user. Subversion is not a directory service. The main 
capability being provided is to enable Subversion clients to be ignorant 
about how the server has been configured to perform authentication and 
identification of users, but still be able to provide extended 
information about Subversion metadata back to the user. Staying within 
domain is probably smart as it can be a clear boundary around the scope 
that is being agreed to.

Final thoughts on this draft:

The reference implementation should come with perhaps two server modules 
to support this capability. One should be a caching LDAP implementation 
that is fully configurable. One should be based on operating system 
services (PAM or getent() for Unix?). Other implementations should be 
possible, but left outside of core.

If the Subversion developers agree to some refinement of this proposal, 
I understand that developers resources are limited and that there is no 
guarantee that it would ever be implemented or if implemented that it 
would ever be completed and distributed in core. I'm thinking that this 
sort of project might be a good entry point for somebody such as myself 
to contribute. Not sure about time right now - but if you put in the 
effort to review and refine, then it would be only fair for me to at 
least try to contribute.

Thanks for the time you put into this Julian.

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

This is great, Julian. It is pretty good for a draft. I'll get back to 
you with the detailed answers tonight - just wanted to give my thumbs up.

I'm ok with a simpler solution that just sets the attributes on commit, 
but what you have described looks like a good step up from the minimum 
and solves additional requirements which fall under "would be nice" for 
me... Thanks!


On 01/04/2012 01:42 PM, Julian Foad wrote:
> Hi Mark.
>
> I think I can see to some extent what you are getting at, but not clearly.  We all need a common frame of reference for understanding why and how some sort of extended author information could be useful.  To help us get there, I put
> together the following tentative proposal to act as a basis for
> discussion.  Perhaps we can now move on to talking about specific requirements and designs.  What parts of it are aligned with your thinking and what
> have I got wrong or missed out?
>
> Please note that this draft is purely an invention of my mind and I do not expect it to be an accurate reflection of your or anyone else's requirements.
>
>
> A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION
>
> USE CASES
>
> 1.[This one I am aware of.]
>
>    A large company has authenticated user ids that are numeric.  That means the "log" and "blame" information shown by most Subversion clients is not easy to understand.  Therefore they use a (post-commit?) hook to change
> the svn:author property to a more friendly string, which (mostly) solves the display issue.  However, it causes other problems.  [What problems?]
>
>
> 2. [This one is a guess.]
>
>    The leader of a small development team sharing a Subversion repository with other teams wants to set up a build slave that will send an email to the users who committed revisions leading to a build failure.  The machine can see the Subversion user id but how can it get the user's email address?  The team leader could ask the repository administrator to add a post-commit hook that adds an email address to a revision property after every commit, but that
>
>      * requires involving the server admin;
>      * won't get updated when the user changes their email address;
>      * won't work for testing old revisions that were already committed before that time;
>      * won't work if the build slave software needs to read a list of all user id->email mappings at once.
>
>
> 3. [This one is a guess.]
>
>    An administrator wants to integrate Subversion with an issue tracker.  Users have different user ids on the two tools.  The admin wants to configure the tracker so that it automatically annotates an already committed Subversion revision with some status information.  How can the tracker know with what user id to contact the Subversion server?
>
> The rest of the proposal addresses UC1 and part of UC2 but not UC3.  (UC3 looks like it needs some totally separate solution, outside of Subversion.)
>
>
>
> REQUIREMENTS
>
>    A Subversion client (of any kind so designed) shall be able to read extended information about the author of a revision.  This information shall consist of a (possibly empty) set of fields.  The set of possible extended author fields shallinclude at least:
>
>      * authenticated user id
>
>      * display name
>      * email address
>
>    It shall be possible to add other fields on the server side (by software upgrade and/or by configuration), and for a client (of any kind so designed) to discover and read these fields without any software upgrade on the client side.
>    The svn:author property shall continue to exist.  When not using the extended author fields, the svn:author property must continue to operate as before.  When using the extended author fields, the design may restrict the use of the svn:author field.  Example: the design could require that if extended author fields are to be usable then the svn:author field always holds the authenticated user id and must always be present and non-empty.
>
>
>    A client shall access the extended author fields through the Subversion server, through the existing client-server protocols, possibly with protocol extensions.  Any protocol extensions shall be backward compatible in that an old server with a new client or an old client with a new server shall (without user intervention) use the old 'svn:author' property.
>
>
>    The fields that are available from a particular server or repository are determined by the administrator.  For any particular committed revision, the server may provide any or all or none of the extended author fields.  A client cannot rely on any particular field being available except to the extent that the administrator gives such an assurance.  Example: if the client requests the authenticated user id and email address for a revision whose author has no email address recorded,the server shall provide the authenticated user id but no email address.  If the server is temporarily unable to look up any information about a user, the server should respond with no extended author fieldsinstead of waiting.
>
>
>    The extended author fields are dynamic in the sense that the server need not always return the same values for the same committed revision.  For example,a client might repeat exactly the same request for information about revision 1234 twice in quick succession, and the server might provide the email address as "a@b.c" the first time and "dd@ee.ff" the second time.  Even the "authenticated user id" field could change.
>
>
> DESIGN
>
>    The extended author fields are delivered through revision properties.  The values are UTF-8 text.  These revision properties are readable but not writable by clients.
>
>    Three property names are initially designated  as "well known":
>
>      * prop name: "svn:author:authn-id"
>        purpose: authenticated user id
>        format: as used by Subversion's authentication (the default
>          value of svn:author)
>
>      * prop name: "svn:author:display-name"
>        purpose: display name
>        format: a single line (no line breaks), e.g. person's full
>          name or shortened name or nickname
>
>      * prop name: "svn:author:email"
>        purpose: email address
>        format: [TO BE SPECIFIED HERE]
>
>
>    Other property names in this name space beginning with "svn:author:" can be designated as "well known" in the future, by an official announcement from the Subversion project.
>
>    An administrator can configure other extended author fields to use property names that are not in the "svn:" name space.  Example: an administrator could configure the property name "author:pgp-sig" to hold the author's PGP signature.
>
>
>
> SERVER DESIGN
>    Any time the server is about to send a set of revision properties to
> the client, the server looks up the extended author fields and adds
> corresponding properties to the set of revision properties that it
> reports to the client.  These property values override any values The server looks up the extended author fieldsthrough some mechanism not defined here,using the value of the"svn:author" property as a key.  The server may cache the results, provided that there is a way for the administrator to make the server use updated information.
>
>
>    If the client attempts to set any revision property in the "svn:author:" name space, the server shall report an error to the client.  This applies even if the property value matches the value that was last read from the server or is currently known to the server, and even if the
> specific property name is not known to the server.  If the client attempts to set any revision property that is not in the "svn:author:" name space but might be configured as an extended author field, the server records that revision property in the normal way.  If a revision property (of any name) has a stored value and the extended author field look-up also provides a value for the same property name, the latter takes priority.
>
>
>    The extended author fields [are | are not] available to the following hook scripts: pre-commit, ...
>
>
> CLIENT DESIGN
>
>    Just an example.  The "svn log" and "svn blame" commands could request the revision property named "svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of "svn:author".  Further, a client-side configuration option could specify which property name should be used for these display purposes, so for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of "svn:author:display-name".
>
>
>
> FURTHER SCOPE
>
>    Does a client need to be able to look up the information in other ways, such as starting from svn:author rather than a revision number, or starting from an extended author field?
>
>
> - Julian


-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/05/2012 07:44 AM, Johan Corveleyn wrote:
> On Wed, Jan 4, 2012 at 7:42 PM, Julian Foad<ju...@btopenworld.com>  wrote:
>
> [ ... ]
>
>> SERVER DESIGN
>>    Any time the server is about to send a set of revision properties to
>> the client, the server looks up the extended author fields and adds
>> corresponding properties to the set of revision properties that it
>> reports to the client.  These property values override any values The server looks up the extended author fieldsthrough some mechanism not defined here,using the value of the"svn:author" property as a key.  The server may cache the results, provided that there is a way for the administrator to make the server use updated information.
> Just wondering: a lookup approach, does that address the original
> problem that started this whole discussion? I.e.: how to avoid the
> information loss when importing from GIT into a Subversion repository?
> Since GIT has those additional attributes (display name and email
> address?) annotated with every commit, a lookup approach is in general
> not sufficient to store this information ...
>
> Not important I think, but I'm just noting the discrepancy ...

I was thinking this as well, but I dismissed it (perhaps prematurely) 
with the thought that GIT being DVCS, does not have the capability to 
have a centralized authority in terms of mapping these attributes. 
Subversion is designed for centralization of the metadata, and therefore 
it may be a better fit for the mappings to also be centralized. Somebody 
who is importing GIT to Subversion might choose to do so by selecting an 
appropriate unique identifier for their requirements they could then 
import the mappings to the centralized record mapping unique identifier 
to attributes. LDAP or what have you.

Alternatively, the model could normally prevent svn:author:* to be set 
but if they happen to exist, they could be served as historical data.

Either way could be made to work. Not sure what is "best".

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Jan 4, 2012 at 7:42 PM, Julian Foad <ju...@btopenworld.com> wrote:

[ ... ]

> SERVER DESIGN
>   Any time the server is about to send a set of revision properties to
> the client, the server looks up the extended author fields and adds
> corresponding properties to the set of revision properties that it
> reports to the client.  These property values override any values The server looks up the extended author fieldsthrough some mechanism not defined here,using the value of the"svn:author" property as a key.  The server may cache the results, provided that there is a way for the administrator to make the server use updated information.

Just wondering: a lookup approach, does that address the original
problem that started this whole discussion? I.e.: how to avoid the
information loss when importing from GIT into a Subversion repository?
Since GIT has those additional attributes (display name and email
address?) annotated with every commit, a lookup approach is in general
not sufficient to store this information ...

Not important I think, but I'm just noting the discrepancy ...

-- 
Johan

Re: format of svn:author

Posted by Julian Foad <ju...@btopenworld.com>.

Hi Mark.

I think I can see to some extent what you are getting at, but not clearly.  We all need a common frame of reference for understanding why and how some sort of extended author information could be useful.  To help us get there, I put 
together the following tentative proposal to act as a basis for 
discussion.  Perhaps we can now move on to talking about specific requirements and designs.  What parts of it are aligned with your thinking and what 
have I got wrong or missed out?

Please note that this draft is purely an invention of my mind and I do not expect it to be an accurate reflection of your or anyone else's requirements.


A PROPOSAL FOR EXTENDED AUTHOR IDENTIFICATION

USE CASES

1.[This one I am aware of.]

  A large company has authenticated user ids that are numeric.  That means the "log" and "blame" information shown by most Subversion clients is not easy to understand.  Therefore they use a (post-commit?) hook to change 
the svn:author property to a more friendly string, which (mostly) solves the display issue.  However, it causes other problems.  [What problems?]


2. [This one is a guess.]

  The leader of a small development team sharing a Subversion repository with other teams wants to set up a build slave that will send an email to the users who committed revisions leading to a build failure.  The machine can see the Subversion user id but how can it get the user's email address?  The team leader could ask the repository administrator to add a post-commit hook that adds an email address to a revision property after every commit, but that

    * requires involving the server admin;
    * won't get updated when the user changes their email address;
    * won't work for testing old revisions that were already committed before that time;
    * won't work if the build slave software needs to read a list of all user id->email mappings at once.


3. [This one is a guess.]

  An administrator wants to integrate Subversion with an issue tracker.  Users have different user ids on the two tools.  The admin wants to configure the tracker so that it automatically annotates an already committed Subversion revision with some status information.  How can the tracker know with what user id to contact the Subversion server?

The rest of the proposal addresses UC1 and part of UC2 but not UC3.  (UC3 looks like it needs some totally separate solution, outside of Subversion.)



REQUIREMENTS

  A Subversion client (of any kind so designed) shall be able to read extended information about the author of a revision.  This information shall consist of a (possibly empty) set of fields.  The set of possible extended author fields shallinclude at least:

    * authenticated user id

    * display name
    * email address

  It shall be possible to add other fields on the server side (by software upgrade and/or by configuration), and for a client (of any kind so designed) to discover and read these fields without any software upgrade on the client side.
  The svn:author property shall continue to exist.  When not using the extended author fields, the svn:author property must continue to operate as before.  When using the extended author fields, the design may restrict the use of the svn:author field.  Example: the design could require that if extended author fields are to be usable then the svn:author field always holds the authenticated user id and must always be present and non-empty.


  A client shall access the extended author fields through the Subversion server, through the existing client-server protocols, possibly with protocol extensions.  Any protocol extensions shall be backward compatible in that an old server with a new client or an old client with a new server shall (without user intervention) use the old 'svn:author' property.


  The fields that are available from a particular server or repository are determined by the administrator.  For any particular committed revision, the server may provide any or all or none of the extended author fields.  A client cannot rely on any particular field being available except to the extent that the administrator gives such an assurance.  Example: if the client requests the authenticated user id and email address for a revision whose author has no email address recorded,the server shall provide the authenticated user id but no email address.  If the server is temporarily unable to look up any information about a user, the server should respond with no extended author fieldsinstead of waiting.


  The extended author fields are dynamic in the sense that the server need not always return the same values for the same committed revision.  For example,a client might repeat exactly the same request for information about revision 1234 twice in quick succession, and the server might provide the email address as "a@b.c" the first time and "dd@ee.ff" the second time.  Even the "authenticated user id" field could change.


DESIGN

  The extended author fields are delivered through revision properties.  The values are UTF-8 text.  These revision properties are readable but not writable by clients.

  Three property names are initially designated  as "well known":

    * prop name: "svn:author:authn-id"
      purpose: authenticated user id
      format: as used by Subversion's authentication (the default
        value of svn:author)

    * prop name: "svn:author:display-name"
      purpose: display name
      format: a single line (no line breaks), e.g. person's full
        name or shortened name or nickname

    * prop name: "svn:author:email"
      purpose: email address
      format: [TO BE SPECIFIED HERE]


  Other property names in this name space beginning with "svn:author:" can be designated as "well known" in the future, by an official announcement from the Subversion project.

  An administrator can configure other extended author fields to use property names that are not in the "svn:" name space.  Example: an administrator could configure the property name "author:pgp-sig" to hold the author's PGP signature.



SERVER DESIGN
  Any time the server is about to send a set of revision properties to 
the client, the server looks up the extended author fields and adds 
corresponding properties to the set of revision properties that it 
reports to the client.  These property values override any values The server looks up the extended author fieldsthrough some mechanism not defined here,using the value of the"svn:author" property as a key.  The server may cache the results, provided that there is a way for the administrator to make the server use updated information.


  If the client attempts to set any revision property in the "svn:author:" name space, the server shall report an error to the client.  This applies even if the property value matches the value that was last read from the server or is currently known to the server, and even if the 
specific property name is not known to the server.  If the client attempts to set any revision property that is not in the "svn:author:" name space but might be configured as an extended author field, the server records that revision property in the normal way.  If a revision property (of any name) has a stored value and the extended author field look-up also provides a value for the same property name, the latter takes priority.


  The extended author fields [are | are not] available to the following hook scripts: pre-commit, ...


CLIENT DESIGN

  Just an example.  The "svn log" and "svn blame" commands could request the revision property named "svn:author:display-name", and if that is returned then use it instead of "svn:author", otherwise use the value of "svn:author".  Further, a client-side configuration option could specify which property name should be used for these display purposes, so for example some users in a particular team could choose to have the "author:nickname" revision property displayed instead of "svn:author:display-name".



FURTHER SCOPE

  Does a client need to be able to look up the information in other ways, such as starting from svn:author rather than a revision number, or starting from an extended author field?


- Julian

Re: format of svn:author

Posted by Branko Čibej <br...@apache.org>.

On 04.01.2012 13:50, Mark Mielke wrote:
> Branko: If "svn log", "svn blame", and anything like TortoiseSVN or
> Subclipse were to support this, you might have a point. As it is,
> anybody with teams large enough such that the unique identifier is not
> recognizable (i.e. committer A immediately recognizes and knows that
> unique identifier for committer B) needs to FUDGE svn:author to
> include additional information which is not really part of the unique
> identifier at all and is only a humanly representable version of the
> unique identifier, and this leads to:
>
> 1) Breakage in other tools. Committer mappings don't work.
> 2) The unique identifier is now not correct as it includes non-unique,
> non-permanent details that change.

I understand all this, but how do you propose that, e.g., "svn blame"
would guess /which/ of the alternative identification tokens it's
supposed to show? If you don't want to always show the unique ID, then
obviously you'd choose one of the alternatives based on ... what? The
identity of the invoker of the command? Some other criterion? Things
quickly become horribly hairy.

Without specific use cases and examples, it's hard to come up with any
kind of coherent identification scheme that's different from what we
have now.

-- Brane

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

Branko: If "svn log", "svn blame", and anything like TortoiseSVN or 
Subclipse were to support this, you might have a point. As it is, 
anybody with teams large enough such that the unique identifier is not 
recognizable (i.e. committer A immediately recognizes and knows that 
unique identifier for committer B) needs to FUDGE svn:author to include 
additional information which is not really part of the unique identifier 
at all and is only a humanly representable version of the unique 
identifier, and this leads to:

1) Breakage in other tools. Committer mappings don't work.
2) The unique identifier is now not correct as it includes non-unique, 
non-permanent details that change.

But now I'm repeating myself. I think the problem here is that people 
are theorizing about subjects that they have not had to deal with real 
life problems for. Theory vs practice. You can say that Enterprise like 
to repeat things - but I'm not sure you understand what Enterprise is 
doing... it just looks like repeating to you, and so you assume it has 
no purpose, and therefore no merit.

On 01/04/2012 05:35 AM, Branko Čibej wrote:
> On 04.01.2012 11:09, Vincent Lefevre wrote:
>> On 2012-01-03 15:44:47 +0100, Branko Čibej wrote:
>>> I think this whole thread is slightly bogus. It should be obvious that
>>> whatever is in the svn:author field has better be a unique identifier of
>>> the person responsible for the commit, regardless of how it gets there.
>> I'd say that this choice should entirely be made by the administrator
>> of the repository.
> Exactly. And we give that choice, at least for Apache-embedded servers
> (which is what enterprises will use, I hope).
>
> If we, say, added another property where admins could write a whole
> other set of information, we'd either have to define the format (and
> incidentally tee off the 90% who want a different format), or leave the
> contents up to the administrator (and tee off the other 90% who want
> compatibility across diverse installations).
>
> I still don't understand why it's so hard for other tools to, e.g., look
> up the svn:author unique ID on an LDAP server somewhere. Otherwise we're
> effectively duplicating (a small part of) any of the "standard"
> directory services.
>
> (Yeah, I know that "enterprise" tools like to duplicate functionality
> and mess up open standards while they're at it, but I don't see why we
> should be doing the same.)
>
> -- Brane

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Branko Čibej <br...@apache.org>.

On 04.01.2012 11:09, Vincent Lefevre wrote:
> On 2012-01-03 15:44:47 +0100, Branko Čibej wrote:
>> I think this whole thread is slightly bogus. It should be obvious that
>> whatever is in the svn:author field has better be a unique identifier of
>> the person responsible for the commit, regardless of how it gets there.
> I'd say that this choice should entirely be made by the administrator
> of the repository.

Exactly. And we give that choice, at least for Apache-embedded servers
(which is what enterprises will use, I hope).

If we, say, added another property where admins could write a whole
other set of information, we'd either have to define the format (and
incidentally tee off the 90% who want a different format), or leave the
contents up to the administrator (and tee off the other 90% who want
compatibility across diverse installations).

I still don't understand why it's so hard for other tools to, e.g., look
up the svn:author unique ID on an LDAP server somewhere. Otherwise we're
effectively duplicating (a small part of) any of the "standard"
directory services.

(Yeah, I know that "enterprise" tools like to duplicate functionality
and mess up open standards while they're at it, but I don't see why we
should be doing the same.)

-- Brane

Re: format of svn:author

Posted by Vincent Lefevre <vi...@vinc17.net>.

On 2012-01-03 15:44:47 +0100, Branko Čibej wrote:
> I think this whole thread is slightly bogus. It should be obvious that
> whatever is in the svn:author field has better be a unique identifier of
> the person responsible for the commit, regardless of how it gets there.

I'd say that this choice should entirely be made by the administrator
of the repository. For instance, for my personal repository, I am
the only person who commits (that's the definition of a personal
repository), so that I choose to put in svn:author the machine (or
network) from which I do the commit.

-- 
Vincent Lefèvre <vi...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/02/2012 04:48 AM, Alan Barrett wrote:
> On Mon, 02 Jan 2012, Mark Mielke wrote:
>>> If your third party tools can't extract the unique ID from 
>>> svn:author = "Display Name <un...@domain>" then perhaps the 
>>> problem lies at least as much in your third party tools as in 
>>> subversion.
>>
>> I wonder if you thought this through before posting. :-)
>>
>> You are saying that if I make up an essentially arbitrary scheme, 
>> such as "Display Name <un...@domain>", and you have a tool which 
>> is unaware of my scheme, and therefore your tool fails to matches 
>> users in the region because of my scheme - that your tool has the 
>> problem?
>
> It's a free text field, although it's probably a bad idea to put more 
> than one line of text there.  As the administrator who sets up the svn 
> repository, you are responsible for choosing what text you put in 
> svn:author.  If, as you said, you have tools that want to be able to 
> map it to a a more restricted type, such as a login name, or employee 
> number, or (part of) an email address, then the tool is responsible 
> for performing the mapping.  If the tool can't perform the mapping 
> then, yes, I say that the tool is incompatible with the way the 
> repository administrator has chosen to use the svn:author field.

No. I don't control the hundreds of tools that support Subversion. The 
tools cannot be responsible for conventions they are unaware of. I think 
you are thinking of the tiny little scope where the only components in 
the system are Subversion itself and tools that I (or you) are directly 
responsible for and have the power to change. This is an extremely small 
view of the problem.

>> Otherwise, only extremely casual interpretation can be done of the 
>> field. For example, it can be treated as a unique identifier - but 
>> more like a "foreign key" unique identifier in the sense that it is a 
>> key in some domain, but not necessarily a domain I know about or am 
>> an authority for.
> As the administrator who sets up the svn repository, and the hooks 
> that edit or validate the data as it goes into the svn:author field, 
> you have absolute control over the data format, so it's not fair to 
> say that it's in a domain that you don't know about -- It's in a 
> domain that you choose.  Whatever format you choose, you should make 
> sure your other tools can deal with it.

Only in the extremely small view that I describe above. So not really 
relevant to the real requirements.

>
>> Our exact compromise for the last three years is:
>>
>> 1) original svn:author value arrives on the server as as "1234567" - 
>> a corporate unique identifier
>> 2) pre-commit re-writes svn:author to "Full Name (<original 
>> svn:author value>)"
>> 3) pre-commit adds <company>:gid as "<original svn:author value>"
>>
>> Then as I mention - various other tools such as FishEye have explicit 
>> mappings from "Mark Mielke (1234567)" => "1234567" for each 
>> Subversion repository. We're primarily a ClearCase and Perforce shop 
>> right now - but even so, I have several Subversion repository 
>> mappings of this form. It works. It just sucks.
>
> If FishEye needs a huge mapping table from "text as it appears in 
> svn:author" to "unique id", with a row in the table for every possible 
> ID, then this process will be very painful for you; on the other hand, 
> if you could configure FishEye to extract the "1234567" from "Mark 
> Mielke (1234567)" using a regular expression or other string 
> manipulation technique, then it would be much more maintainable.

It is not reasonable for a Subversion user to customize every tool they 
use. It is far preferred for Subversion to provide the solution as a 
core function.

> I expect that changes on the subversion side could help (as you have 
> mentioned, adding more properties, or clearly documenting one or more 
> suggested ways of providing structure inside svn:author, or both), but 
> I still hold the opinion that your pain is caused at least as much by 
> FishEye as by svn.

More than help. It is the only true solution. Anything else - such as 
each Subversion user customizing their own tools - is entirely a hack.

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Alan Barrett <ap...@cequrux.com>.

On Mon, 02 Jan 2012, Mark Mielke wrote:
>> If your third party tools can't extract the unique ID from 
>> svn:author = "Display Name <un...@domain>" then perhaps the 
>> problem lies at least as much in your third party tools as in 
>> subversion.
>
> I wonder if you thought this through before posting. :-)
>
> You are saying that if I make up an essentially arbitrary 
> scheme, such as "Display Name <un...@domain>", and you have 
> a tool which is unaware of my scheme, and therefore your tool 
> fails to matches users in the region because of my scheme - that 
> your tool has the problem?

It's a free text field, although it's probably a bad idea to put 
more than one line of text there.  As the administrator who sets 
up the svn repository, you are responsible for choosing what text 
you put in svn:author.  If, as you said, you have tools that want 
to be able to map it to a a more restricted type, such as a login 
name, or employee number, or (part of) an email address, then the 
tool is responsible for performing the mapping.  If the tool can't 
perform the mapping then, yes, I say that the tool is incompatible 
with the way the repository administrator has chosen to use the 
svn:author field.

> Otherwise, only extremely casual interpretation can be done of 
> the field. For example, it can be treated as a unique identifier 
> - but more like a "foreign key" unique identifier in the sense 
> that it is a key in some domain, but not necessarily a domain I 
> know about or am an authority for.

As the administrator who sets up the svn repository, and the hooks 
that edit or validate the data as it goes into the svn:author 
field, you have absolute control over the data format, so it's not 
fair to say that it's in a domain that you don't know about -- 
It's in a domain that you choose.  Whatever format you choose, you 
should make sure your other tools can deal with it.

> Our exact compromise for the last three years is:
>
> 1) original svn:author value arrives on the server as as 
> "1234567" - a corporate unique identifier
> 2) pre-commit re-writes svn:author to "Full Name (<original 
> svn:author value>)"
> 3) pre-commit adds <company>:gid as "<original svn:author 
> value>"
>
> Then as I mention - various other tools such as FishEye have 
> explicit mappings from "Mark Mielke (1234567)" => "1234567" for 
> each Subversion repository. We're primarily a ClearCase and 
> Perforce shop right now - but even so, I have several Subversion 
> repository mappings of this form. It works. It just sucks.

If FishEye needs a huge mapping table from "text as it appears 
in svn:author" to "unique id", with a row in the table for every 
possible ID, then this process will be very painful for you; on 
the other hand, if you could configure FishEye to extract the 
"1234567" from "Mark Mielke (1234567)" using a regular expression 
or other string manipulation technique, then it would be much more 
maintainable.

I expect that changes on the subversion side could help (as you 
have mentioned, adding more properties, or clearly documenting one 
or more suggested ways of providing structure inside svn:author, 
or both), but I still hold the opinion that your pain is caused at 
least as much by FishEye as by svn.

--apb (Alan Barrett)

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

To be blunt - this is exactly why Subversion will stay small. When the 
main people on the developer list hold small world views such as "it is 
the responsibility of the organization that uses Subversion to customize 
the dozens of tools they integrate with in a non-standard way", it is 
guaranteed that Subversion adoption cannot go beyond a certiain 
threshold. Which is fine. Sometimes you need small. It is simply not 
feasible for every organization to customize every tool they use. The 
thought itself is ridiculous.

But if this is truly the opinion, then my efforts here are wasted. Other 
solutions provide these capabilities out of box.

On 01/03/2012 09:44 AM, Branko Čibej wrote:
> On 03.01.2012 04:02, Stefan Fuhrmann wrote:
>> * What is an author?
>> * How do concepts like "account", "person",
>>    "role", "group" relate to that notion?
>> * What aspects of the above can be provided to /
>>    handled by Subversion in a portable way?
>> * What are typical use-cases and do they match
>>    with the definitions you use?
>>
>>>      svn:author =>  unique identifier
>> That seems to be the hardest to define and may
>> be difficult to provide. Identifies the person?
>> PGP key ID?
>>>      svn:author-name =>  Mark Mielke
>> That would denote the "person". How would
>> duplicates and name changes be handled?
>>>      svn:author-email =>  mark@mark.mielke.cc
>> That looks close to the "account" aspect.
> I think this whole thread is slightly bogus. It should be obvious that
> whatever is in the svn:author field has better be a unique identifier of
> the person responsible for the commit, regardless of how it gets there.
> Once that requirement is met, everything else is "simply" a matter of
> getting the repository administrator to set up that identifier in such a
> way that the tools user by the users of that repository can do something
> useful with it.
>
> I propose that this is /entirely/ in the domain of the organization that
> is maintaining the Subversion installation. There is no standard way of
> identifying all pertinent user information -- or rather, there are some
> 57 different standards. There's nothing stopping the repository
> administrator from writing a pre-commit hook that adds tailored revprops
> with identifiers that comply with all those 57 standards and any custom
> ones, too. Asking Subversion to add reserved revprop names for all
> possible (not even plausible) identification schemes would be a bit like
> asking to add a different boolean property for every known character
> encoding -- in other words, an explosion of reserved property names that
> would, in general, give no benefit to the vast majority of users.
>
> All that would happen is that different organizations would abuse those
> property names in different, incompatible ways.
>
> -- Brane


-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

FYI that "full name" or "email address" are not actually aspects of a 
unique identifier. People's names change, and email addresses change. 
The unique identifier should normally be much more persistent and should 
enable cross referencing with other tools and database reports. The name 
and email is for human consumption. The unique identifier is for machine 
consumption. Subversion has chosen to define only one attribute to 
represent both which makes it extremely difficult to get either. We're 
talking about changing customizing the software for dozens of open 
source and commercial products, just to make the "full name" visible to 
users. But without a standard or convention - we're talking about each 
organization defining their own standard or convention and providing 
their own customization to dozens of tools. This works against the open 
source community being leveraged to provide solutions which benefit the 
most people from a shared component.

In the below - Branko seems to suggest that because there is a lot of 
material on this "out there" and lots of choices, therefore it isn't the 
place for Subversion to step in and choose one to adopt. I suggest that 
the reason there is a wealth of material out there is because the 
subject is important and that the reason a standard is preferred is 
specifically because it allows integration between many tools from many 
providers.

On 01/03/2012 09:44 AM, Branko Čibej wrote:
> I think this whole thread is slightly bogus. It should be obvious that
> whatever is in the svn:author field has better be a unique identifier of
> the person responsible for the commit, regardless of how it gets there.
> Once that requirement is met, everything else is "simply" a matter of
> getting the repository administrator to set up that identifier in such a
> way that the tools user by the users of that repository can do something
> useful with it.
>
> I propose that this is /entirely/ in the domain of the organization that
> is maintaining the Subversion installation. There is no standard way of
> identifying all pertinent user information -- or rather, there are some
> 57 different standards. There's nothing stopping the repository
> administrator from writing a pre-commit hook that adds tailored revprops
> with identifiers that comply with all those 57 standards and any custom
> ones, too. Asking Subversion to add reserved revprop names for all
> possible (not even plausible) identification schemes would be a bit like
> asking to add a different boolean property for every known character
> encoding -- in other words, an explosion of reserved property names that
> would, in general, give no benefit to the vast majority of users.
>
> All that would happen is that different organizations would abuse those
> property names in different, incompatible ways.
>
> -- Brane

-- 
Mark Mielke<ma...@mielke.cc>

Re: format of svn:author

Posted by Branko Čibej <br...@apache.org>.

On 03.01.2012 04:02, Stefan Fuhrmann wrote:
> * What is an author?
> * How do concepts like "account", "person",
>   "role", "group" relate to that notion?
> * What aspects of the above can be provided to /
>   handled by Subversion in a portable way?
> * What are typical use-cases and do they match
>   with the definitions you use?
>
>>     svn:author => unique identifier
> That seems to be the hardest to define and may
> be difficult to provide. Identifies the person?
> PGP key ID?
>>     svn:author-name => Mark Mielke
> That would denote the "person". How would
> duplicates and name changes be handled?
>>     svn:author-email => mark@mark.mielke.cc
> That looks close to the "account" aspect.

I think this whole thread is slightly bogus. It should be obvious that
whatever is in the svn:author field has better be a unique identifier of
the person responsible for the commit, regardless of how it gets there.
Once that requirement is met, everything else is "simply" a matter of
getting the repository administrator to set up that identifier in such a
way that the tools user by the users of that repository can do something
useful with it.

I propose that this is /entirely/ in the domain of the organization that
is maintaining the Subversion installation. There is no standard way of
identifying all pertinent user information -- or rather, there are some
57 different standards. There's nothing stopping the repository
administrator from writing a pre-commit hook that adds tailored revprops
with identifiers that comply with all those 57 standards and any custom
ones, too. Asking Subversion to add reserved revprop names for all
possible (not even plausible) identification schemes would be a bit like
asking to add a different boolean property for every known character
encoding -- in other words, an explosion of reserved property names that
would, in general, give no benefit to the vast majority of users.

All that would happen is that different organizations would abuse those
property names in different, incompatible ways.

-- Brane

Re: format of svn:author

Posted by Stefan Fuhrmann <eq...@web.de>.

On 02.01.2012 09:34, Mark Mielke wrote:
> On 01/02/2012 02:52 AM, Alan Barrett wrote:
>> On Sun, 01 Jan 2012, Mark Mielke wrote:
>>>> Another idea is to change the revprop's value in the pre-commit or 
>>>> post-commit hook: [...]
>>>
>>> This is what we've been doing for about two years. It has the 
>>> consequence that tools don't automatically match unique identifier 
>>> to commit as they no longer match.
>>
>> If your third party tools can't extract the unique ID from svn:author 
>> = "Display Name <un...@domain>" then perhaps the problem lies at 
>> least as much in your third party tools as in subversion.
>
> I wonder if you thought this through before posting. :-)
>

Hi Mark,

I only loosely followed this thread but I still want
to throw my 2 cents in here.

It seems that you are looking for a formal way
(i.e. accessible to tools) to identify the "author"
of a change and you listed some problems that
you are facing with the current state of things.
To me, it seems that there actually *is* plenty room
for improvement and simply nobody really brought
the topic up until now.

If you are working towards a proposal (problem
description + proposed solution), make sure you
start your analysis at the very basic. Such as:

* What is an author?
* How do concepts like "account", "person",
   "role", "group" relate to that notion?
* What aspects of the above can be provided to /
   handled by Subversion in a portable way?
* What are typical use-cases and do they match
   with the definitions you use?

>     svn:author => unique identifier
That seems to be the hardest to define and may
be difficult to provide. Identifies the person?
PGP key ID?
>     svn:author-name => Mark Mielke
That would denote the "person". How would
duplicates and name changes be handled?
>     svn:author-email => mark@mark.mielke.cc
That looks close to the "account" aspect.

-- Stefan^2.

Re: format of svn:author

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 01/02/2012 02:52 AM, Alan Barrett wrote:
> On Sun, 01 Jan 2012, Mark Mielke wrote:
>>> Another idea is to change the revprop's value in the pre-commit or 
>>> post-commit hook: [...]
>>
>> This is what we've been doing for about two years. It has the 
>> consequence that tools don't automatically match unique identifier to 
>> commit as they no longer match.
>
> If your third party tools can't extract the unique ID from svn:author 
> = "Display Name <un...@domain>" then perhaps the problem lies at 
> least as much in your third party tools as in subversion.

I wonder if you thought this through before posting. :-)

You are saying that if I make up an essentially arbitrary scheme, such 
as "Display Name <un...@domain>", and you have a tool which is 
unaware of my scheme, and therefore your tool fails to matches users in 
the region because of my scheme - that your tool has the problem? 
Despite the documentation for Subversion never mentioning or even 
suggesting a convention that you should be responsible for understanding?

No.

The convention must be defined in the Subversion book, and it must be 
part of the release notes so that third party tools adhere to the 
convention.

Otherwise, only extremely casual interpretation can be done of the 
field. For example, it can be treated as a unique identifier - but more 
like a "foreign key" unique identifier in the sense that it is a key in 
some domain, but not necessarily a domain I know about or am an 
authority for. This is why tools such as FishEye provide a "committer 
mapping" that is precisely this. It allows me to code on a 
per-repository basis each of the committer values that I want to 
associate with my own FishEye account. This is really horrible for 
dozens of repositories and thousands of users. Every user having to 
input their own mappings? Yuck, yuck, yuck.

If, instead, a convention was defined such that (and just hand waving 
here, I'm not really attached to these details):

     svn:author => unique identifier
     svn:author-name => Mark Mielke
     svn:author-email => mark@mark.mielke.cc

Then tools could make much more intelligent decisions on what to do or 
show. They could use svn:author as the mapping key, but show name and 
email in "svn log" or graphical browsers.

The above model is a simple solution to the problem. More data stored 
for every commit. Data which can be used by downstream tools. This has a 
benefit in that the data is static which is sometimes good. In a large 
project, there is normally a turnover, and accounts that exists or are 
active in one year are not necessarily the same as the ones active in 
another year. By taking a snapshot of the data at the time of commit, it 
represents a permanent record of sorts. ClearCase is a system which does 
it this way. Event history records which track such things as object 
creation which is the closest map to svn:author have username, domain 
(NIS - old school), and fullname.

The other alternative is for a Subversion client to be able to lookup 
details for svn:author by asking the server using a published protocol. 
This model would allow the server to implement these queries 
transparently using LDAP lookups or similar depending on the 
requirements of the project. This stores less data for every commit, and 
allows for dynamic updates. It would allow for "Mark Mielke" to become 
"Mielke, Mark" with a server side configuration, but in contrast to the 
previous method, it would not all for a snapshot of history to be taken. 
It would be a requirement that the identity management system used on 
the server would always have a record for me even after I am gone - or  
- alternatively, that the detail would become more vague over time. I 
disappear, and my account disappears - so it is left with only a unique 
identifier which might not be enough information.

In our particular case, we value all three of: 1) unique identifiers to 
be able to do cross referencing of reports between tools, 2) display of 
humanly readable names in output such as "svn log" or annotations in 
FishEye, ViewVC, Eclipse, or whatever tool the user is using, and 3) 
permanent historical record for auditing purposes.

Our exact compromise for the last three years is:

1) original svn:author value arrives on the server as as "1234567" - a 
corporate unique identifier
2) pre-commit re-writes svn:author to "Full Name (<original svn:author 
value>)"
3) pre-commit adds <company>:gid as "<original svn:author value>"

Then as I mention - various other tools such as FishEye have explicit 
mappings from "Mark Mielke (1234567)" => "1234567" for each Subversion 
repository. We're primarily a ClearCase and Perforce shop right now - 
but even so, I have several Subversion repository mappings of this form. 
It works. It just sucks.

For svn:author to have structure - either internally using punctuation 
such as Unix gecos, or separated out as separate attributes - and for 
tools to all honour this structure - would be far more ideal. As 
Subversion is already well established, separate attributes is probably 
the best approach as it would enable forwards and backwards 
compatibility for uses of svn:author implemented by the Subversion code 
base itself. Tools that know how to access and do intelligent things 
with the new fields could feel free to do so. Users of tools that do not 
do something intelligent things with the new fields could point to the 
Subversion release notes and Subversion book and say "this new attribute 
svn:author-name should be recognized by your tool", the change can make 
the tool roadmap, and we can all be happy.

-- 
Mark Mielke<ma...@mielke.cc>

format of svn:author

Posted by Alan Barrett <ap...@cequrux.com>.

On Sun, 01 Jan 2012, Mark Mielke wrote:
>> Another idea is to change the revprop's value in the pre-commit 
>> or post-commit hook: [...]
>
> This is what we've been doing for about two years. It has 
> the consequence that tools don't automatically match unique 
> identifier to commit as they no longer match.

If your third party tools can't extract the unique ID from 
svn:author = "Display Name <un...@domain>" then perhaps the 
problem lies at least as much in your third party tools as in 
subversion.

--apb (Alan Barrett)

Re: Problems with the documentation of Subversion dump format

Posted by Mark Mielke <ma...@mark.mielke.cc>.

On 12/31/2011 09:21 PM, Daniel Shahaf wrote:
> Mark Mielke wrote on Sat, Dec 31, 2011 at 01:00:12 -0500:
>> On 12/30/2011 09:35 PM, Daniel Shahaf wrote:
>>> AuthLDAPRemoteUserAttribute cn
>>>
>>> Then you can do
>>>
>>> % svn commit --username "Daniel Shahaf"
>>>
>>> and the logs will show
>>>
>>> ------------------------------------------------------------------------
>>> r1 | Daniel Shahaf | strftime(...) | 1 line
>>> ------------------------------------------------------------------------
>> We use this for a few services - but note how now instead of losing
>> the full name, it now loses the unique identifier. In a company of
>> 1,000+ people, there is a good chance for overlap of "cn". There
>> might be only one Mark Mielke, but other names such as John Sullivan
>> there could be many. The "cn" is not a unique identifier and cannot
>> be used to key off. It is for display purposes only.
>>
> Another idea is to change the revprop's value in the pre-commit or
> post-commit hook:
>      ..
>      author=`svnlook propget --revprop -t $TXN svn:author`
>      svnadmin setrevprop -t $TXN svn:author "`getent passwd $author | cut -d: -f5 | cut -d, -f1`<$a...@localdomain>"
>      ..
> and then people still authenticate with their uid's, but all existing
> tools will automatically show DVCS-style name+address author names.

This is what we've been doing for about two years. It has the 
consequence that tools don't automatically match unique identifier to 
commit as they no longer match.

> And if _that_ 's not good enough... what Stefan said: someone needs to
> sit down, define a problem, design a solution, and push it through.
> Perhaps it's as simple as defining a few new revprops?

Yes. This is what would be required to address this requirement permanently.

-- 
Mark Mielke<ma...@mielke.cc>