You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Eric S. Raymond" <es...@thyrsus.com> on 2012/11/29 06:59:45 UTC

reposurgeon now writes Subversion repositories

This is something that probably doesn't happen very often -
cross-posting to the Subversion and git dev lists that is on-topic for
both :-).

The repo head version of reposurgeon can now write Subversion
repositories from its common git-import-stream-based representation of
repository histories, as well as reading them in.  This joins full
support for git, hg, and bzr; it means that in theory reposurgeon
could now be used to move revision histories from these systems to
Subversion, as well as the other way around.

(For those of you who have been living under a rock, reposurgeon is a
multi-VCS surgery and conversion tool. Since 2.x it does a more
intelligent job of lifting from Subversion to anything else than any
other tool I know of. Much more at <http://www.catb.org/esr/reposurgeon/>.)

Presently, writing (as opposed to reading) Subversion repos is more of
a stunt than a real production technique, and may always remain so.
It has serious limitations.  I am posting because I think the details
of those limitations will be of some technical interest to both
Subversion and git developers.

Indented paragraphs is the documentation from reposurgeon's manual
page.  I have added some further notes.

  In summary, Subversion repository histories do not round-trip through
  reposurgeon editing. File content changes are preserved but some
  metadata is unavoidably lost.  Furthermore, writing out a DVCS history
  in Subversion also loses significant portions of its metadata.

  Writing a Subversion repository or dump stream discards author
  information, the committer's name, and the hostname part of the commit
  address; only the commit timestamp and the local part of the
  committer's email address are preserved, the latter becoming the
  Subversion author field.  However, reading a Subversion repository and
  writing it out again will preserve the author fields.

Subversion's metadata doesn't have separate author and committer
properties, and doesn't store anything but a Unix user ID as
attribution.  I don't see any way around this.

  Import-stream timestamps have 1-second granularity. The subsecond
  parts of Subversion commit timestamps will be lost on their way through
  reposurgeon.

Unavoidable in moving from Subversion to git import streams, and one
of two places where git's data model requires us to throw away
information.  

However, I think I could preserve this information in a
Subversion-to-Subversion editing scenario by storing the incoming
timestamps as floats and only truncating them on import-stream output,
leaving the subseconds in place for Subversion output.

  Empty directories aren't represented in import streams. Consequently,
  reading and writing Subversion repositories preserves file content,
  but not empty directories.  It is also not guaranteed that after
  editing a Subverson repository that the sequence of directory
  creations and deletions relative to other operations will be
  identical; the only guarantee is that enclosing directories will be
  created before any files in them are.

  When reading a Subversion repository, reposurgeon discards the special
  directory-copy nodes associated with branch creations.  These can't be
  recreated if and when the repository is written back out to
  Subversion; rather, each branch copy node from the original translates
  into a branch creation plus the first set of file modifications on the
  branch.

In theory, I could relax the rules of reposurgeon's internal
representation so that empty directory-creation and deletion nodes are
not discarded at read time but only when outputting a git event stream.

That would bring Subversion repositories closer to round-tripping, but
not get all the way there.  One problem is botched branch copies -
directory copies with cp(1) followed by Subversion add operations.
This is not an uncommon malformation; reposurgeon takes it in stride,
treating these as though they had been real branch copies and
simplifying the backlinks appropriately.

  When reading a Subversion repository, reposurgeon also automatically
  breaks apart mixed-branch commits.

It has to.  These just can't be represented in the import-stream model of
branching.

  Because of the preceding two points, it is not guaranteed that 
  even revision numbers will be stable when a Subversion repository
  is read in and then written out!

So not only can Subversion repos fail to round-trip exactly, in the
presence of lots of branch copies and mixed-branch commits the
relationship between the read-in and written out revision numbers
could get pretty unpredictable.

  Subversion repositories are always written with a standard
  (trunk/tags/branches) layout. Thus, a repository with a nonstandard
  shape that has been analyzed by reposurgeon won't be written out with
  the same shape.

In particular, this means linear Subversion repositories with no trunk
(an organization some smaller projects used to use and might still)
will turn into branchy repos with trunk on the way out.

  Subversion has a concept of "flows"; that is, named segments of
  history corresponding to files or directories that are created when
  the path is added, cloned when the path is copied, and deleted when
  the path is deleted. This information is not preserved in import
  streams or the internal representation that reposurgeon uses.  Thus,
  after editing, the flow boundaries of a Subversion history may be
  arbitrarily changed.

This is me being obsessive about documenting the details.  I think it
is doubtful that most Subversion users even know flows exist.

  Bugs: Presently, writing out a history to a Subversion repository does
  not create mergeinfo properties representing branch merges. It also
  loses all information about lightweight tags (though annotated tags
  are turned into Subversion-style directory copies). These bugs will
  probably be fixed in future reposurgeon releases.

I'm also not sure the present code handles branchiness exactly right.  
My next task is to write a test suite for this new feature.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

The Constitution is not neutral. It was designed to take the
government off the backs of the people.
	-- Justice William O. Douglas 

AW: reposurgeon now writes Subversion repositories

Posted by Markus Schaber <m....@codesys.com>.
Hi,

Von: Eric S. Raymond [mailto:esr@thyrsus.com]
> > How does reposurgeon handle empty directories with (node) properties?
> 
> Currently by ignoring all of them except svn:ignore, which it turns
> into .gitignore content on the gitspace side.  And now vice-versa, too.
> 
> Not clear what else it *could* do.  I'd take suggestions.

AFAIR, SvnBridge (which bridges SVN to Team Foundation Server for CodePlex) creates a hidden .svnproperties file where all the properties of the directory and files are stored.

I'm not really sure, but maybe this could be used as some standard to bridge svn properties to non-svn VCSes.

Best regards

Markus Schaber

CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com
CODESYS internet forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <da...@elego.de>.
Philip Martin wrote on Thu, Nov 29, 2012 at 13:27:09 +0000:
> Daniel Shahaf <da...@elego.de> writes:
> 
> > Specifically, the server code special-cases svn:author and svn:date ---
> > an administrator would have to use a pre-commit hook (or patch the
> > server) to avoid those being set from the authentication info and system
> > clock.
> 
>   - for RA access (a network client like svn) the server sets both
>     svn:author and svn:date, the client has no control.
> 
>   - for FS access (a filesystem client like svnadmin) the server sets
>     svn:date and the client controls svn:author.
> 
> svnadmin load is not as efficient as it could be as it has to do a
> revprop change after each commit to set svn:date.

And replay() users (svnrdump/svnsync) need to set both date and author.

Re: reposurgeon now writes Subversion repositories

Posted by Philip Martin <ph...@wandisco.com>.
Daniel Shahaf <da...@elego.de> writes:

> Specifically, the server code special-cases svn:author and svn:date ---
> an administrator would have to use a pre-commit hook (or patch the
> server) to avoid those being set from the authentication info and system
> clock.

  - for RA access (a network client like svn) the server sets both
    svn:author and svn:date, the client has no control.

  - for FS access (a filesystem client like svnadmin) the server sets
    svn:date and the client controls svn:author.

svnadmin load is not as efficient as it could be as it has to do a
revprop change after each commit to set svn:date.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <da...@elego.de>.
Branko Čibej wrote on Thu, Nov 29, 2012 at 13:41:34 +0100:
> On 29.11.2012 12:46, Eric S. Raymond wrote:
> > Daniel Shahaf <da...@elego.de>:
> >>> Subversion's metadata doesn't have separate author and committer
> >>> properties, and doesn't store anything but a Unix user ID as
> >>> attribution.  I don't see any way around this.
> >> You're not fully informed, then.
> >>
> >> 1) svn:author revprops can contain any UTF-8 string.  They are not
> >> restricted to Unix user id's.  (For example, they can contain full
> >> names, if the administrator so chooses.)
> > Right.  At one point during the development of this feature I was
> > accidentally storing the full email field in this property.  So I
> > already knew that this is allowed at some level.  
> >
> > And, I have no trouble believing that svn log will cheerfully echo
> > anything that I choose to stuff in that field.  
> >
> > But...
> >
> > (1) How much work would it be it to set up a Subversion installation 
> > so that when I svn commit, the tool does the right thing, e.g. puts
> > a DVCS-style fullname/email string in there?
> 
> I don't know how common that practice is, but I've worked on a project
> where svn:author was filled in from the DN and e-mail attributes of an
> X-509 certificate. It's also quite easy to set svn:author from
> information stored in LDAP (that is, if you find anything about LDAP
> actually easy).
> 

Another option is to change the svn:author prop in the pre-commit hook.

> > RFC: If I wrote a patch that let Subversion users set their own
> > content string for the author field in ~/.subversion/config, would
> > you merge it?  Because I'd totally write that.
> 
> Hint: svn commit --with-revprop svn:author="Twizzle Strongpants
> <ts...@interwebs>"
> 
> I personally wouldn't mind if that were a user preference in the config
> file. It'd have to be a per-server config option, however; and even
> better, per-repository, which is a concept that the Subversion config
> file does not currently support. (There's a reason why I put my ID into
> .git/config, not ~/.gitconfig.)
> 
> Note that it's up to the server administrator to actually allow clients
> to set svn:author (and any other revision property). The assumed, and
> most common, configuration is that the server derives svn:author from
> authentication information.
> 

Specifically, the server code special-cases svn:author and svn:date ---
an administrator would have to use a pre-commit hook (or patch the
server) to avoid those being set from the authentication info and system
clock.

> [...]
> 
> >> You might also seek community consensus to reserve an svn:foo name for
> >> the "original author" property --- perhaps svn:original-author --- so
> >> that reposurgeon and other git->svn tools can interoperate in the way
> >> they transfer the "original author" information.
> > OK.  But I like the idea of letting the users set their own author
> > content string better.  Instead of another layer of kluges, why
> > shouldn't Subversion join the DVCSes in the happy land of
> > Internet-scoped attributions?
> 
> This discussion has come up before. Today, the assumption that
> svn:author is something that the server has verified (modulo admins'
> shenanigans) is pretty much cast in concrete.
> 
> I'm open to suggestions, up to and including breaking that assumption,
> though obviously I'd prefer not to.
> 
> -- Brane
> 
> 
> -- 
> Branko Čibej
> Director of Subversion | WANdisco | www.wandisco.com
> 

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Eric S. Raymond wrote on Fri, Nov 30, 2012 at 05:36:17 -0500:
> Daniel Shahaf <d....@daniel.shahaf.name>:
> > If you can have a username= in the per-server section, you probably can
> > have it _today_ in the [global] section too and it would take effect
> > (just like N other options that can be set at either global or
> > per-server scope)...
> > 
> > So you'd need to invent a new option?
> 
> Maybe not.  That would be good.  
> 
> Where is this documented?

Existing parameters, as well as the semantics of the [global] section,
are documented in the default 'servers' file:

% rm -rf d
% svn help --config-dir=d >/dev/null
% $PAGER d/servers

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Daniel Shahaf <d....@daniel.shahaf.name>:
> If you can have a username= in the per-server section, you probably can
> have it _today_ in the [global] section too and it would take effect
> (just like N other options that can be set at either global or
> per-server scope)...
> 
> So you'd need to invent a new option?

Maybe not.  That would be good.  

Where is this documented?
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Alan Barrett <ap...@cequrux.com>.
On Sat, 01 Dec 2012, Eric S. Raymond wrote:
>> Alternative server-side implementation (via breser):
>> [[[
>> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
>> ]]]
>
>Um, does this mean everyone's commits are coing to look like
>Daniel Shahaf made them?  If not, where is --tunnel-user going to
>come from?

It comes from the .ssh/authorized_keys file, in a context 
that is associated with exactly one ssh key (the "ssh-rsa 
..." part); this would be the same place that previously had 
"--tunnel-user=danielsh".

--apb (Alan Barrett)

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Daniel Shahaf <d....@daniel.shahaf.name>:
> Server-side implementation, independent of RA method: (via brane)

Ah, now that looks somewhat like progress.  But some (possibly all) of
these solutions have serious weaknesses which you need to think about.

> [[[
> #!/usr/bin/env python
> 
> import sys
> from svn.repos import *
> from svn.fs import *
> from svn.core import SVN_PROP_REVISION_AUTHOR
> 
> FULLNAMES = {
>   'danielsh': 'Daniel Shahaf',
> }
> 
> reposdir, txnname = sys.argv[1:3]
> 
> repos = svn_repos_open(reposdir, None)
> fs = svn_repos_fs(repos)
> txn = svn_fs_open_txn(fs, txnname, None)
> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>                        FULLNAMES.get(propval, propval), None)
> ]]]

This one confines your Unix-ID adhesion to the FULLNAMES array,  which
is a long step in the right direction because it means your repo history
will be local-ID-clean.  

But it doesn't actually solve the mobility problem.  If the project
ever moves, you still have to patch the FULLNAMES dictionary by hand.
This approach won't scale very well.

I also note that you do really want "J. Random User <jr...@foobar.org>" 
with a preferred "home" address as part of the mix, because the
entropy of human names alone is not quite high enough.  Yes, if I see
"Daniel Shahaf" I'm pretty sure there is only one of those.  But
"Willam Smith" or "Robert Jones"? " :-)
 
> Alternative server-side implementation (via markphip):
> [[[
> AuthLDAPRemoteUserAttribute cn
> ]]]

A variant of this that does "J. Random User <jr...@foobar.org>"
looks like it might work provided there's an LDAP directory and we trust 
the LDAP directory to be up to date.  The second assumption seems
reasonable if we grant the first.  

But the first?  I've heard of LDAP and know roughly what it does, but
I've never seen a live instance.  Forges don't have them.  Maybe I'm
being parochial, but this seems like a solution for a case too unusual
to be very interesting.

> Alternative server-side implementation (via breser):
> [[[
> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
> ]]]

Um, does this mean everyone's commits are coing to look like
Daniel Shahaf made them?  If not, where is --tunnel-user going to
come from?

> Client-side implementation (via danielsh):
> [[[
> [ -n "${EMAIL}" ] && svn() {
>  if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
>   command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
>  else
>   command svn "$@"
>  fi
> }
> ]]]

Bletch.  This one is begging for failure unless you can train your
users to use a wrapper script every time - good luck with that.  One
important case where this approach will break, and cause acrimony, is
Emacs VC mode.  That's somewhere up to 50% of your users under
open-source platforms, if the stats on editor usage are to be believed.

The lesson from this criticism is intended to be that it's not
enough to make Internet-scoped IDs possible, you have to make 
them *easy* - that is, not disruptive of normal workflow.

But this has been fruitful.  I think I can write a simple proposal
about how to solve this problem now.  I'll do it in my next email.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

AW: reposurgeon now writes Subversion repositories

Posted by Markus Schaber <m....@codesys.com>.
Hi, Brane,

Von: Branko Čibej [mailto:brane@wandisco.com]

> Gesendet: Samstag, 1. Dezember 2012 15:24
> An: dev@subversion.apache.org
> Betreff: Re: reposurgeon now writes Subversion repositories
[...]
> P.S.: I find it fascinating that DVCS aficionados haven't noticed that
> GitHub takes the D out of DVCS very effectively, thereby making git
> actually useful for most normal people.

This is my personal "Quote of the week". :-)


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com
CODESYS internet forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915


Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 01.12.2012 14:14, Eric S. Raymond wrote:
> (Apologies if this is a duplicate send.  I just had a disturbing
> glitch in my MUA and want to make sure it got out.)
>
> Daniel Shahaf <d....@daniel.shahaf.name>:
>> Server-side implementation, independent of RA method: (via brane)
> Ah, now that looks somewhat like progress.  But some (possibly all) of
> these solutions have serious weaknesses which you need to think about.
>
>> [[[
>> #!/usr/bin/env python
>>
>> import sys
>> from svn.repos import *
>> from svn.fs import *
>> from svn.core import SVN_PROP_REVISION_AUTHOR
>>
>> FULLNAMES = {
>>   'danielsh': 'Daniel Shahaf',
>> }
>>
>> reposdir, txnname = sys.argv[1:3]
>>
>> repos = svn_repos_open(reposdir, None)
>> fs = svn_repos_fs(repos)
>> txn = svn_fs_open_txn(fs, txnname, None)
>> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
>> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>>                        FULLNAMES.get(propval, propval), None)
>> ]]]
> This one confines your Unix-ID adhesion to the FULLNAMES array,  which
> is a long step in the right direction because it means your repo history
> will be local-ID-clean.  
>
> But it doesn't actually solve the mobility problem.  If the project
> ever moves, you still have to patch the FULLNAMES dictionary by hand.
> This approach won't scale very well.

Oh come on. Daniel was giving an example cobbled up in all of 5 minutes.
Surely you can imagine replacing FULLNAMES with some user database?

> I also note that you do really want "J. Random User <jr...@foobar.org>" 
> with a preferred "home" address as part of the mix, because the
> entropy of human names alone is not quite high enough.  Yes, if I see
> "Daniel Shahaf" I'm pretty sure there is only one of those.  But
> "Willam Smith" or "Robert Jones"? " :-)

See above. You can put anything into FULLNAMES and/or a database and/or
LDAP (which is just a database).

>> Alternative server-side implementation (via markphip):
>> [[[
>> AuthLDAPRemoteUserAttribute cn
>> ]]]
> A variant of this that does "J. Random User <jr...@foobar.org>"
> looks like it might work provided there's an LDAP directory and we trust 
> the LDAP directory to be up to date.  The second assumption seems
> reasonable if we grant the first.
>
> But the first?  I've heard of LDAP and know roughly what it does, but
> I've never seen a live instance.  Forges don't have them.  Maybe I'm
> being parochial, but this seems like a solution for a case too unusual
> to be very interesting.

Oh right. Does it make the solution any less unusual if I tell you that
all of the ASF services, including Subversion, have single-signon via
LDAP? Or that you can just as easily replace mod_ldap with
mod_authn_<database-of-choice> which essentially brings you back to the
post-commit hook example.

> But this has been fruitful.  I think I can write a simple proposal
> about how to solve this problem now.  I'll do it in my next email.

No offence, but it sure looks as if you're deliberately nitpicking in
order to give yourself an excuse for writing a proposal for a feature
that Subversion, essentially, already has.

Certainly I'll read your proposal and don't intend to dismiss it out of
hand. But trusting the server to properly authenticate committers is a
basic axiom of Subversion's centralized model. And for the record, it's
also a basic axiom of GitHub's centralized model.


-- Brane


P.S.: I find it fascinating that DVCS aficionados haven't noticed that
GitHub takes the D out of DVCS very effectively, thereby making git
actually useful for most normal people.

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Sat, Dec 1, 2012 at 8:14 AM, Eric S. Raymond <es...@thyrsus.com> wrote:
> This one confines your Unix-ID adhesion to the FULLNAMES array,  which
> is a long step in the right direction because it means your repo history
> will be local-ID-clean.

It confines it to whatever value that python script could be taught
how to get it.  I'm sure you can modify the python script to get it
from a different source.

For that matter you could have the script in the repo and use a
post-commit script that updates it everytime someone commits it.  Then
the script moves with the repo.

> But it doesn't actually solve the mobility problem.  If the project
> ever moves, you still have to patch the FULLNAMES dictionary by hand.
> This approach won't scale very well.

Of course it doesn't scale.  It's a trivial example to demonstrate the
technique.

What I don't understand is your hypothetical situation is demanding an
awful lot of Subversion.  You've scoped things like an issue tracker
and other things as being part of this.  But for some reason you've
not bothered to scope an authentication system and exporting and
moving the users.  All of these forge sites allow you to access the
repo with the same username/password as the issue tracker etc...

So you need some sort of federated (even if it's just specific to each
project) authentication system.  Subversion doesn't provide that for
you, nor should it.

You're probably not going to find one that's ready made to your
situation either.  You're going to need to do some thinking about how
to configure things.

> I also note that you do really want "J. Random User <jr...@foobar.org>"
> with a preferred "home" address as part of the mix, because the
> entropy of human names alone is not quite high enough.  Yes, if I see
> "Daniel Shahaf" I'm pretty sure there is only one of those.  But
> "Willam Smith" or "Robert Jones"? " :-)

And it's trivial to adjust it to be that way.

> But the first?  I've heard of LDAP and know roughly what it does, but
> I've never seen a live instance.  Forges don't have them.  Maybe I'm
> being parochial, but this seems like a solution for a case too unusual
> to be very interesting.

Why not?  What's so hard about setting up an LDAP instance for the project?

>> Alternative server-side implementation (via breser):
>> [[[
>> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
>> ]]]
>
> Um, does this mean everyone's commits are coing to look like
> Daniel Shahaf made them?  If not, where is --tunnel-user going to
> come from?

No this setup is something that gets added to the start of everyone
line (different for each user) of the authorized_keys file for the
user you're having people use with svn+ssh.  Generally I'd expect
whatever system you're using to manage these keys is going to handle
this for you(e.g. user goes to some web form and pastes their public
key in and then this system edits the authorized_keys file).  You'll
have to write something.

> The lesson from this criticism is intended to be that it's not
> enough to make Internet-scoped IDs possible, you have to make
> them *easy* - that is, not disruptive of normal workflow.

I'd say that the choices you've been presented with are relatively
easy to implement.  Tons of corporate users have managed to implement
things like this.

What isn't easy is what you're really asking to do.  Which is systems
design.  You want to pull together a bunch of disparate programs and
make them work together in a coordinated and seamless way.  That's not
terribly easy to do without putting some degree of time building the
infrastructure around them.

Which is really what a forge site is about.

If you want to build a forge site that has portable setups then you're
going to have to take and write a way to export all the data (not just
the repositories, issue trackers db, wiki db, etc...) but also all the
glue between those pieces.

Unless you've got multiple existing forges already interested in
implementing something like this that come together to implement an
agreed upon data format.  Your best bet is going to be implementing a
packaged up system that uses various systems and then exports and
imports your data format.

We've gone well beyond the area that Subversion is involved and quite
frankly we're heading entirely into off topic design work for your
forge.

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
(Apologies if this is a duplicate send.  I just had a disturbing
glitch in my MUA and want to make sure it got out.)

Daniel Shahaf <d....@daniel.shahaf.name>:
> Server-side implementation, independent of RA method: (via brane)

Ah, now that looks somewhat like progress.  But some (possibly all) of
these solutions have serious weaknesses which you need to think about.

> [[[
> #!/usr/bin/env python
> 
> import sys
> from svn.repos import *
> from svn.fs import *
> from svn.core import SVN_PROP_REVISION_AUTHOR
> 
> FULLNAMES = {
>   'danielsh': 'Daniel Shahaf',
> }
> 
> reposdir, txnname = sys.argv[1:3]
> 
> repos = svn_repos_open(reposdir, None)
> fs = svn_repos_fs(repos)
> txn = svn_fs_open_txn(fs, txnname, None)
> propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
> svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
>                        FULLNAMES.get(propval, propval), None)
> ]]]

This one confines your Unix-ID adhesion to the FULLNAMES array,  which
is a long step in the right direction because it means your repo history
will be local-ID-clean.  

But it doesn't actually solve the mobility problem.  If the project
ever moves, you still have to patch the FULLNAMES dictionary by hand.
This approach won't scale very well.

I also note that you do really want "J. Random User <jr...@foobar.org>" 
with a preferred "home" address as part of the mix, because the
entropy of human names alone is not quite high enough.  Yes, if I see
"Daniel Shahaf" I'm pretty sure there is only one of those.  But
"Willam Smith" or "Robert Jones"? " :-)
 
> Alternative server-side implementation (via markphip):
> [[[
> AuthLDAPRemoteUserAttribute cn
> ]]]

A variant of this that does "J. Random User <jr...@foobar.org>"
looks like it might work provided there's an LDAP directory and we trust 
the LDAP directory to be up to date.  The second assumption seems
reasonable if we grant the first.  

But the first?  I've heard of LDAP and know roughly what it does, but
I've never seen a live instance.  Forges don't have them.  Maybe I'm
being parochial, but this seems like a solution for a case too unusual
to be very interesting.

> Alternative server-side implementation (via breser):
> [[[
> command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
> ]]]

Um, does this mean everyone's commits are coing to look like
Daniel Shahaf made them?  If not, where is --tunnel-user going to
come from?

> Client-side implementation (via danielsh):
> [[[
> [ -n "${EMAIL}" ] && svn() {
>  if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
>   command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
>  else
>   command svn "$@"
>  fi
> }
> ]]]

Bletch.  This one is begging for failure unless you can train your
users to use a wrapper script every time - good luck with that.  One
important case where this approach will break, and cause acrimony, is
Emacs VC mode.  That's somewhere up to 50% of your users under
open-source platforms, if the stats on editor usage are to be believed.

The lesson from this criticism is intended to be that it's not
enough to make Internet-scoped IDs possible, you have to make 
them *easy* - that is, not disruptive of normal workflow.

But this has been fruitful.  I think I can write a simple proposal
about how to solve this problem now.  I'll do it in my next email.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

A man with a gun is a citizen.  A man without a gun is a subject.

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Alan Barrett wrote on Sat, Dec 01, 2012 at 12:05:48 +0300:
> Perhaps it would be a good first step to add examples to the 
> documentation, showing how the admin can use "Full Name <em...@address>" 
> in the svn:author field, with all the common access methods.

Server-side implementation, independent of RA method: (via brane)
[[[
#!/usr/bin/env python

import sys
from svn.repos import *
from svn.fs import *
from svn.core import SVN_PROP_REVISION_AUTHOR

FULLNAMES = {
  'danielsh': 'Daniel Shahaf',
}

reposdir, txnname = sys.argv[1:3]

repos = svn_repos_open(reposdir, None)
fs = svn_repos_fs(repos)
txn = svn_fs_open_txn(fs, txnname, None)
propval = svn_fs_txn_prop(txn, SVN_PROP_REVISION_AUTHOR, None)
svn_fs_change_txn_prop(txn, SVN_PROP_REVISION_AUTHOR,
                       FULLNAMES.get(propval, propval), None)
]]]

Alternative server-side implementation (via markphip):
[[[
AuthLDAPRemoteUserAttribute cn
]]]

Alternative server-side implementation (via breser):
[[[
command="svnserve -t --tunnel-user='Daniel Shahaf'" ssh-rsa ...
]]]

Client-side implementation (via danielsh):
[[[
[ -n "${EMAIL}" ] && svn() {
 if [ x"$1" = x"ci" ] || [ x"$1" = x"commit" ]; then
  command svn --with-revprop=svn:x-committer-email=${EMAIL} "$@"
 else
  command svn "$@"
 fi
}
]]]

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <da...@apache.org>.
On Sat, Dec 01, 2012 at 01:36:10AM -0500, Eric S. Raymond wrote:
> Branko ??ibej <br...@wandisco.com>:
> > Why do forges not do that? I don't know, but it's definitely not because
> > Subversion doesn't give them fifteen ways of manipulating the svn:author
> > property.
> 
> I don't know either.  
> 
> I do know that protests to me of the general form "if they'd just use 
> poorly-documented alchemical formula XYZ everything would be fine" aren't going
> to solve your problem.  In no case have I ever seen, etc.  The guy in the
> trenches is telling you that your fifteen ways aren't producing any
> result he can distinguish from "svn:author is always a Unix user ID".
> 
> You can throw up your hands and say "the forges aren't doing it right", sure.

The C code --- on both client and server --- _does not know_ what the email
addresses are; either the user or the admin would need to enable that feature
explicitly.  That's a social problem, not a technical one.

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Alan Barrett <ap...@cequrux.com>:
> Perhaps it would be a good first step to add examples to the
> documentation, showing how the admin can use "Full Name
> <em...@address>" in the svn:author field, with all the common access
> methods.

Yes. I think it is (a) possible that better documentation can solve this
problem, and (b) certain that better documentation is *necessary* to solve
this problem.

I'm willing to help.  You can look at the description of the dump-load
format at notes/dump-load-format.txt, most of which I wrote earlier
this year, to see that this is not an idle promise.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Alan Barrett <ap...@cequrux.com>.
On Sat, 01 Dec 2012, Eric S. Raymond wrote:
>I've lost count of the number of Subversion repo
>lifts I've done (has to be more than a dozen at this point), and in no
>case have I ever seen *anything* but a local Unix ID in the svn:author
>property.

Yes, it's probably true that most svn repositories use short 
strings that resemble unix user ids, and a lot of the svn 
documentation uses such strings in examples.  But it's also 
true that the admin can use almost any string they like.  In 
repositories that I have set up, I have always used short strings 
that resemble local unix IDs, but in most cases those strings 
would not have been valid unix user names on the server host.

Perhaps it would be a good first step to add examples to 
the documentation, showing how the admin can use "Full Name 
<em...@address>" in the svn:author field, with all the common 
access methods.

--apb (Alan Barrett)

Re: reposurgeon now writes Subversion repositories

Posted by Greg Stein <gs...@gmail.com>.
On Sat, Dec 1, 2012 at 1:36 AM, Eric S. Raymond <es...@thyrsus.com> wrote:
>...
>> Why do forges not do that? I don't know, but it's definitely not because
>> Subversion doesn't give them fifteen ways of manipulating the svn:author
>> property.
>
> I don't know either.
>
> I do know that protests to me of the general form "if they'd just use
> poorly-documented alchemical formula XYZ everything would be fine" aren't going
> to solve your problem.  In no case have I ever seen, etc.  The guy in the
> trenches is telling you that your fifteen ways aren't producing any
> result he can distinguish from "svn:author is always a Unix user ID".
>
> You can throw up your hands and say "the forges aren't doing it right", sure.
> And if you want to sleepwalk your way into obsolescence, that'll be a fine
> and effective way to get there.

Woah. Wait a minute, Eric. You're the one positing a [Federation]
scenario, then stating that Subversion does not meet that criteria. I
believe that is called a "Strawman".

I really like your idea of establishing a GUID, and transportation of
artifacts. This is all good. I also see that a good number of svn devs
are engaging with you on that idea. Yet... putting up a strawman and
killing it, doesn't work very well.

If we back up a step: Forges have been using svn:author as the
*authenticated* identity. They may express that identity as a simple
username, or as an LDAP attribute, or as an email name. If the forges
are not storing the identity per your ideal, then it seems wrong to
lay that on Subversion.

I *do* believe it is fair to state "the standard Subversion tools
should do $X to enable better federation". And I believe that is where
you can help.

Historically, Subversion has associated commits with authenticated
identities. It seems that you propose to adjust/augment that
relationship. If you can clarify, then I think we can make it happen.

Cheers,
-g

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Branko Čibej <br...@wandisco.com>:
> On 30.11.2012 22:53, Eric S. Raymond wrote:
> 
> > The problem is that in order for that state to be mobile, none of it
> > can have pointers to data that can't move off the host server.  In
> > particular, *all user identities have to be Internet-scoped* rather
> > than local Unix IDs.
> 
> At this point I have to ask if you've been reading our responses.

Sure I have.  I'm not insisting on "user must be able to set the
attribution ID", I'm insisting on "it has to still be meaningful when
the project moves".  If OpenID or any of the other similar schemes had
actually succeeded, they would do fine.  The roll-your-own-ID-string
practice is forced on us because there is no authoritative identity
service.

> Nothing requires svn:author to contain Unix user IDs. Nothing prevents
> the server from putting e-mail addresses, or even "Name Surname
> <e...@mail>" strings into svn:author. We specifically designed the property
> so that the server /can/ do this, and there are several widely-used
> mechanisms for doing exactly that, regardless of access method;
> http[s]://, svn+ssh://, svn:// (with or without SASL) all give the
> administrator the hooks to do this.

Right, I understand this.  Forgive me if it seems like a theoretical
quibble, though.  I've lost count of the number of Subversion repo
lifts I've done (has to be more than a dozen at this point), and in no
case have I ever seen *anything* but a local Unix ID in the svn:author
property.

> Why do forges not do that? I don't know, but it's definitely not because
> Subversion doesn't give them fifteen ways of manipulating the svn:author
> property.

I don't know either.  

I do know that protests to me of the general form "if they'd just use 
poorly-documented alchemical formula XYZ everything would be fine" aren't going
to solve your problem.  In no case have I ever seen, etc.  The guy in the
trenches is telling you that your fifteen ways aren't producing any
result he can distinguish from "svn:author is always a Unix user ID".

You can throw up your hands and say "the forges aren't doing it right", sure.
And if you want to sleepwalk your way into obsolescence, that'll be a fine
and effective way to get there.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Daniel Shahaf <d....@daniel.shahaf.name>:
> Haven't a few projects decided to require PGP-signed revisions instead?

Monotone tried an approach in which every revision is cryptosigned.
It pretty much sank, and for a surprising reason.  According to what the
author told me back in 2010 (I think), the computational cost of full
crypto hash chaining is so high that users reject it. 
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Eric S. Raymond wrote on Sat, Dec 01, 2012 at 01:03:28 -0500:
> kmradke@rockwellcollins.com <km...@rockwellcollins.com>:
> > Possibly I'm naive, but a client provided email address is far
> > from being a GUID.  In fact, I can pretty much set my email address
> > to anything in most DVCS tools.  Who is to say I haven't used
> > your email address when committing?
> 
> Technically, nothing.  The underlying assumption is that you trust
> your contributors not to *want* to spoof each other.
> 
> Sure, it would be nice to have better authentication than that, but
> if you think for a bit you'll see that this is a very hard problem.  
> The cost of solving it would so high that DVCSes have decided they have
> to ignore the spoofing case and hope everybody behaves well.
> 

Haven't a few projects decided to require PGP-signed revisions instead?

> So far, this has worked.
> -- 
> 		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
kmradke@rockwellcollins.com <km...@rockwellcollins.com>:
> Possibly I'm naive, but a client provided email address is far
> from being a GUID.  In fact, I can pretty much set my email address
> to anything in most DVCS tools.  Who is to say I haven't used
> your email address when committing?

Technically, nothing.  The underlying assumption is that you trust
your contributors not to *want* to spoof each other.

Sure, it would be nice to have better authentication than that, but
if you think for a bit you'll see that this is a very hard problem.  
The cost of solving it would so high that DVCSes have decided they have
to ignore the spoofing case and hope everybody behaves well.

So far, this has worked.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
[I reordered some of the blocks from Eric's original to make my reply
flow a little better]

On Fri, Nov 30, 2012 at 1:53 PM, Eric S. Raymond <es...@thyrsus.com> wrote:
> When it happens, Subversion as it is now isn't going to be able to
> play. You guys have a three-way design adhesion between authentication
> identity, attribution identity, and local identity on the host server.
> For Subversion histories to be mobile, *that adhesion must be broken*.

No we only have that 3 way adhesion in one configuration.  In
particular svn+ssh when you use it with local users.

Even then you can avoid the whole thing by having users log into svn
as a single username, set the svn:author by putting this before the
users ssh key in authorized_keys for that user:
command="/usr/bin/svnserve -t --tunnel-user user@example.com" ssh-rsa ...

In fact there is a specialized forge that is using this very
configuration with Subversion, Git and Mercurial:
http://www.wowace.com/wiki/repositories/ssh-public-keys/

> If Subversion wants to continue to have a presence on next-generation
> forges, it's going to have to fix this. DVCSes show us how to solve
> the problem, but it isn't actually about centralized
> vs. decentralized.  It's about being welded to a specific host by its
> /etc/passwd vs. being able to migrate.

You can't seriously think that being dependent to local unix users
would be acceptable to the large corporate installations we support?
Obviously we support using separate authentication.  I'd say we had it
pretty much from day one since the original server was httpd and had a
huge amount of flexibility in how to do authentication with it..
What's different here is that we don't necessarily separate the
attribution from the read/write authentications.  You yourself have
said that you didn't think that was important.

> I'm not handwaving about philosophy.  I'm pointing at a specific
> problem that comes up when you start thinking of an entire software
> project's state as a data object that you should be able to move
> around and re-instantiate on a different forge server.  By "entire
> state" I mean repositories, bugtracker contents, mailing list and
> forum messages, and member capabilities (who is an admin, who is a
> committer, etc.)

Well if the only solution you find acceptable is that the svn:author
is set from the client side then I think we are talking about
philosophy here.  I'm not sure if that's the only solution you find
acceptable but several configuration solutions have been suggested to
let you have usernames that have nothing to do with local unix users.

Re: reposurgeon now writes Subversion repositories

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Fri, Nov 30, 2012 at 5:26 PM, Branko Čibej <br...@wandisco.com> wrote:

> When I create an account at [your favourite forge], I tell it my name
> and give it one of my e-mail addresses. It is, in my opinion, up to the
> forge software to use that in svn:author, not up to local user
> preferences. That /is/ a fundamental difference between the centralised
> and distributed models.
>
> Why do forges not do that? I don't know, but it's definitely not because
> Subversion doesn't give them fifteen ways of manipulating the svn:author
> property.
>

+1.

Furthermore, if the user wants to set their own custom revprops that have
nothing to do with svn:author (e.g., svn:federation-id) and end up with a
practice by convention rather than by server fiat that tools like
reposurgeon can understand, there's plenty of ways for the user *or the
server* to add in additional revprops as well.  Plus, since revprops aren't
versioned, they can always be fixed up after-the-fact to include
appropriate svn:federation-id tags...finally, a "feature" of revprops not
being versioned!  -- justin

Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 30.11.2012 22:53, Eric S. Raymond wrote:

> The problem is that in order for that state to be mobile, none of it
> can have pointers to data that can't move off the host server.  In
> particular, *all user identities have to be Internet-scoped* rather
> than local Unix IDs.

At this point I have to ask if you've been reading our responses.

Nothing requires svn:author to contain Unix user IDs. Nothing prevents
the server from putting e-mail addresses, or even "Name Surname
<e...@mail>" strings into svn:author. We specifically designed the property
so that the server /can/ do this, and there are several widely-used
mechanisms for doing exactly that, regardless of access method;
http[s]://, svn+ssh://, svn:// (with or without SASL) all give the
administrator the hooks to do this.

When I create an account at [your favourite forge], I tell it my name
and give it one of my e-mail addresses. It is, in my opinion, up to the
forge software to use that in svn:author, not up to local user
preferences. That /is/ a fundamental difference between the centralised
and distributed models.

Why do forges not do that? I don't know, but it's definitely not because
Subversion doesn't give them fifteen ways of manipulating the svn:author
property.

Now that's not saying I'm categorically against letting the user set
some revision property automatically on commit. But the problem of
svn:author you're describing is simply what you assert it to be.


-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by km...@rockwellcollins.com.
"Eric S. Raymond" <es...@thyrsus.com> wrote on 11/30/2012 03:53:45 PM:
> Ben Reser <be...@reser.org>:
> > This is really a philosophical difference between a centralized
> > version control system and DVCS.
> 
> No, no, no. If you think that's true, I have failed to communicate.
> 
> I'm not handwaving about philosophy.  I'm pointing at a specific
> problem that comes up when you start thinking of an entire software
> project's state as a data object that you should be able to move
> around and re-instantiate on a different forge server.  By "entire
> state" I mean repositories, bugtracker contents, mailing list and
> forum messages, and member capabilities (who is an admin, who is a
> committer, etc.)
> 
> Motivation: we *want* project state to be mobile because relying on
> any given forger server to be stable forever is too risky.  On decadal
> timescales this is a real, serious problem.  Berlios's near-collapse
> sensitized me about this.
> 
> The problem is that in order for that state to be mobile, none of it
> can have pointers to data that can't move off the host server.  In
> particular, *all user identities have to be Internet-scoped* rather
> than local Unix IDs.  Otherwise, when I try to move project foo to
> server bar, there will be friction in the form of potential name
> collisions that are messy to resolve.

Possibly I'm naive, but a client provided email address is far
from being a GUID.  In fact, I can pretty much set my email address
to anything in most DVCS tools.  Who is to say I haven't used
your email address when committing?  I can easily imagine something
replacing internet email at some point so foo@here.com might be
pretty much meaningless in decadal scale.

Subversion allows the server hosting the data to authenticate and
then manipulate the author id how it sees fit.  Using your federation
you could easily enforce the author field for the hosted Subversion
repositories to be an email address, if that meets your
"internet scoped" concept...

This would preclude any random Subversion repository, but that
same limitation would apply to any random DVCS data as well.

Relying on an user provided value to be globally unique seems
like a bad idea.  I must be missing something obvious...

Kevin R.

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Ben Reser <be...@reser.org>:
> This is really a philosophical difference between a centralized
> version control system and DVCS.

No, no, no. If you think that's true, I have failed to communicate.

I'm not handwaving about philosophy.  I'm pointing at a specific
problem that comes up when you start thinking of an entire software
project's state as a data object that you should be able to move
around and re-instantiate on a different forge server.  By "entire
state" I mean repositories, bugtracker contents, mailing list and
forum messages, and member capabilities (who is an admin, who is a
committer, etc.)

Motivation: we *want* project state to be mobile because relying on
any given forger server to be stable forever is too risky.  On decadal
timescales this is a real, serious problem.  Berlios's near-collapse
sensitized me about this.

The problem is that in order for that state to be mobile, none of it
can have pointers to data that can't move off the host server.  In
particular, *all user identities have to be Internet-scoped* rather
than local Unix IDs.  Otherwise, when I try to move project foo to
server bar, there will be friction in the form of potential name
collisions that are messy to resolve.

It turns out when you dive onto this problem (and I have) that Unix
user IDs are *the* blocker.  There's almost nothing else in a forge's
ontology that can't be resolved in a fairly trivial way.  The only
exception is that project names themselves may collide.

The next generation of forges is going to have to fix this problem.
And it actually isn't terribly difficult to fix - it's creative
engineering, but not blue-sky R&D.  I could certainly implement it,
and I will if I can budget the time. I've done a lot of the design
work already.

When it happens, Subversion as it is now isn't going to be able to
play. You guys have a three-way design adhesion between authentication
identity, attribution identity, and local identity on the host server.
For Subversion histories to be mobile, *that adhesion must be broken*.

If Subversion wants to continue to have a presence on next-generation
forges, it's going to have to fix this. DVCSes show us how to solve
the problem, but it isn't actually about centralized
vs. decentralized.  It's about being welded to a specific host by its
/etc/passwd vs. being able to migrate.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Nov 30, 2012 at 4:05 PM, Ben Reser <be...@reser.org> wrote:

> I'm not sure it's something we want to change for everyone.  I suspect
> you're the first person that's ever raised any complaints about this.
> This is really a philosophical difference between a centralized
> version control system and DVCS.
>
> Depending on your repository access setup there are ways around this.
> If the server side knows the value you want to put in svn:author
> instead of the authenticated user name then it's pretty trivial to fix
> with a hook script.
>
> In my opinion the thing to do here is to allow auto revision
> properties (possibly on a per repo/server) basis.  Then we could pass
> along some user configured extra value with each commit and server
> admins could decided to put a hook script in place that puts it in
> svn:author if they wanted.

I would also just say that most software forges that provide SVN
access use the http protocol and it is not difficult or uncommon to
set the author field to an email address when using the Apache server.
 It does not necessarily require a hook script to do this.  When using
LDAP authentication, as an example, you can simply choose which LDAP
attribute will populate the REMOTE_USER variable which will in turn
populate svn:author

http://httpd.apache.org/docs/2.2/mod/mod_authnz_ldap.html#authldapremoteuserattribute

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Fri, Nov 30, 2012 at 12:51 PM, Eric S. Raymond <es...@thyrsus.com> wrote:
> I'm not sure what the entire right design fix for Subversion is here, but
> I *am* sure you guys should be paying attention to this now so you can have
> the fix ready and deployed by the time forge evolution makes it urgent, which
> I would say will be on the close order of three years out.  Sooner, if I
> actually get to pay concentrated attention to the problem.
>
> So think: What would it take to divorce identity-for-attribution from
> identity-for-authentication, making the former user-settable the way
> DVCSes do?  The challenge, of course, is doing it upward-compatibly.
>
> I apologize for not being able to make any concrete suggestions here;
> outside of the dumpfile format I don't know your code and protocols
> well enough.

I'm not sure it's something we want to change for everyone.  I suspect
you're the first person that's ever raised any complaints about this.
This is really a philosophical difference between a centralized
version control system and DVCS.

Depending on your repository access setup there are ways around this.
If the server side knows the value you want to put in svn:author
instead of the authenticated user name then it's pretty trivial to fix
with a hook script.

In my opinion the thing to do here is to allow auto revision
properties (possibly on a per repo/server) basis.  Then we could pass
along some user configured extra value with each commit and server
admins could decided to put a hook script in place that puts it in
svn:author if they wanted.

If that's something we want to do as a project or not I don't know.

Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 30.11.2012 21:51, Eric S. Raymond wrote:
> Ben Reser <be...@reser.org>:
>> But again, if you want svn:author to be set to some value a user sets
>> locally and that has doesn't necessarily have anything to do with
>> their authentication to the server, you can't do that with username
>> configuration.  Since there is absolutely no guarantee that username
>> will even be sent to the server.
> Alas. I hear you reporting that identity-for-attribution and
> identity-for-authentication are not cleanly enough separated for the
> user to be able to reliably set the latter to a chosen Internet-scoped ID.
>
> Subversion is still off Federation's list, then.  This may not matter
> in itself, as Federation only exists as design ideas at present.  But
> I strongly believe future forges are going to go in the direction I'm
> pointing, even if it doesn't happen to be me leading the way. The wins
> from being able to marshal and cross-load project states are just too
> large to be ignored much longer.  And Subversion as it is now won't
> be ready to play in that world.
>
> I'm not sure what the entire right design fix for Subversion is here, but
> I *am* sure you guys should be paying attention to this now so you can have
> the fix ready and deployed by the time forge evolution makes it urgent, which
> I would say will be on the close order of three years out.  Sooner, if I
> actually get to pay concentrated attention to the problem.
>
> So think: What would it take to divorce identity-for-attribution from
> identity-for-authentication, making the former user-settable the way
> DVCSes do?  The challenge, of course, is doing it upward-compatibly.
>
> I apologize for not being able to make any concrete suggestions here;
> outside of the dumpfile format I don't know your code and protocols
> well enough.

The obvious easy answer is to introduce a new reserved property name for
attribution, without changing the semantics of svn:author.

In retrospect it's unfortunate we didn't call that svn:committer-id or
some such, but there you have it.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Ben Reser <be...@reser.org>:
> But again, if you want svn:author to be set to some value a user sets
> locally and that has doesn't necessarily have anything to do with
> their authentication to the server, you can't do that with username
> configuration.  Since there is absolutely no guarantee that username
> will even be sent to the server.

Alas. I hear you reporting that identity-for-attribution and
identity-for-authentication are not cleanly enough separated for the
user to be able to reliably set the latter to a chosen Internet-scoped ID.

Subversion is still off Federation's list, then.  This may not matter
in itself, as Federation only exists as design ideas at present.  But
I strongly believe future forges are going to go in the direction I'm
pointing, even if it doesn't happen to be me leading the way. The wins
from being able to marshal and cross-load project states are just too
large to be ignored much longer.  And Subversion as it is now won't
be ready to play in that world.

I'm not sure what the entire right design fix for Subversion is here, but
I *am* sure you guys should be paying attention to this now so you can have
the fix ready and deployed by the time forge evolution makes it urgent, which
I would say will be on the close order of three years out.  Sooner, if I
actually get to pay concentrated attention to the problem.

So think: What would it take to divorce identity-for-attribution from
identity-for-authentication, making the former user-settable the way
DVCSes do?  The challenge, of course, is doing it upward-compatibly.

I apologize for not being able to make any concrete suggestions here;
outside of the dumpfile format I don't know your code and protocols
well enough.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Fri, Nov 30, 2012 at 8:03 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> It does yes, if the server has ForceCommand='svnserve -t' configured
> in sshd, then path-based authz and/or "anon-access=none" can be
> meaningfully set up --- and these key off of the svn-level authenticated
> username (as opposed to the OS-level username).

He means svnserve -i, in which case you're using your ssh server like
inetd, which is a little different that what I think you're saying
you're doing.

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Eric S. Raymond wrote on Fri, Nov 30, 2012 at 10:17:39 -0500:
> Ben Reser <be...@reser.org>:
> > Precisely.  I was under the impression that he wanted something that
> > was user controlled and had nothing to do with the authentication to
> > display as the author.
> 
> Maybe I'm confused.  Or perhaps we're using "authentication" in
> different senses on different levels.  I think there's a question I 
> should have asked sooner...
> 
> Normally, access to the Subversion repositories I use is actually authenticated 
> via an ssh key used for login to the server host.  I'm not sure in what sense
> the username field actually contributes any security-relevant information
> in a setup like that.  Does it?

It does yes, if the server has ForceCommand='svnserve -t' configured
in sshd, then path-based authz and/or "anon-access=none" can be
meaningfully set up --- and these key off of the svn-level authenticated
username (as opposed to the OS-level username).

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Fri, Nov 30, 2012 at 7:17 AM, Eric S. Raymond <es...@thyrsus.com> wrote:
> Normally, access to the Subversion repositories I use is actually authenticated
> via an ssh key used for login to the server host.  I'm not sure in what sense
> the username field actually contributes any security-relevant information
> in a setup like that.  Does it?

That's what Branko and I are trying to say.  username is not sent to
the server outside of whatever authentication method the protocol
uses.

In the case of svn+ssh username is never going to be sent to the
server since you're going to use the tunnel's authentication.  So the
svn:author field will be filled with --tunnel-user if passed to
svnserve or the username of the uid svnserve is running as.

The issue you're seeing here is actually called out in the book:
http://svnbook.red-bean.com/en/1.7/svn.serverconfig.choosing.html

[[[
* Only one choice of authentication method is available.
]]]

and

[[[
If you have an existing infrastructure that is heavily based on SSH
accounts, and if your users already have system accounts on your
server machine, it makes sense to deploy an svnserve-over-SSH
solution. Otherwise, we don't widely recommend this option to the
public. It's generally considered safer to have your users access the
repository via (imaginary) accounts managed by svnserve or Apache,
rather than by full-blown system accounts. If your deep desire for
encrypted communication still draws you to this option, we recommend
using Apache with SSL or svnserve with SASL encryption instead.
]]]

For your particular use I'd think that http would be a much better
access method since Apache provides you a lot more flexibility with
the authentication method.

But again, if you want svn:author to be set to some value a user sets
locally and that has doesn't necessarily have anything to do with
their authentication to the server, you can't do that with username
configuration.  Since there is absolutely no guarantee that username
will even be sent to the server.

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Ben Reser <be...@reser.org>:
> Precisely.  I was under the impression that he wanted something that
> was user controlled and had nothing to do with the authentication to
> display as the author.

Maybe I'm confused.  Or perhaps we're using "authentication" in
different senses on different levels.  I think there's a question I 
should have asked sooner...

Normally, access to the Subversion repositories I use is actually authenticated 
via an ssh key used for login to the server host.  I'm not sure in what sense
the username field actually contributes any security-relevant information
in a setup like that.  Does it?
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Thu, Nov 29, 2012 at 11:25 AM, Eric S. Raymond <es...@thyrsus.com> wrote:
> Actually that wasn't in my plan.  It's sufficient that every commit
> get an Internet-scoped ID, anonymity isn't required.

Well without making any changes then you have two choices...

1) Have users use whatever local username for the authentication.
Have the software forge have them put in an email address or whatever
value you want to show in svn:author and then replace svn:author in a
pre-commit hook.

2) Have users authenticate with the email address or whatever value
you want to show in svn:author and have your authn setup for the
server deal with that.  No hook needed.

But those two options are not what I thought you intended.

I'd originally thought you were just going for open access (what we
call anonymous access).  In which case the suggestion of the username
doesn't help because the only time that gets used is when
authentication is setup.

On Fri, Nov 30, 2012 at 1:40 AM, Branko Čibej <br...@wandisco.com> wrote:
> And besides, "username" is the authentication token, which is usually
> exactly what Eric doesn't want to put into svn:author. :)

Precisely.  I was under the impression that he wanted something that
was user controlled and had nothing to do with the authentication to
display as the author.

Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 30.11.2012 09:04, Daniel Shahaf wrote:
> Martin Furter wrote on Fri, Nov 30, 2012 at 09:51:20 +0530:
>> On 11/30/12 00:55, Eric S. Raymond wrote:
>>> Ben Reser<be...@reser.org>:
>>>> The only thing that's really lacking here is a good way to pass along
>>>> extra property values in an easy to configure way per
>>>> server/repository so that you can use a client defined value to put it
>>>> in svn:author.  I don't really see adding support for something like
>>>> that as terribly difficult.  The only caveat I would make is that you
>>>> should realize the change here is a client side change and that it'll
>>>> take some time for users to upgrade clients (most distros are still
>>>> shipping SVN 1.6 over a year after 1.7 released).
>>>>
>>>> Once you have something like that, you can expose it to the hook
>>>> scripts and they can change the svn:author field to whatever the local
>>>> repository prefers.  If local repositories want to store the local
>>>> authenticated user in a different property they can also do this.
>>> Sounds like I should write that patch to make a preferred-ID string
>>> available out of ~/.subversion/config, then.  As soon as possible.
>> I wanted to reply that this should go into ~/.subversion/servers. But i  
>> found the entry "username" in there. So just add a username entry to the  
>> global section.
> If you can have a username= in the per-server section, you probably can
> have it _today_ in the [global] section too and it would take effect
> (just like N other options that can be set at either global or
> per-server scope)...
>
> So you'd need to invent a new option?

And besides, "username" is the authentication token, which is usually
exactly what Eric doesn't want to put into svn:author. :)

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Martin Furter wrote on Fri, Nov 30, 2012 at 09:51:20 +0530:
> On 11/30/12 00:55, Eric S. Raymond wrote:
>> Ben Reser<be...@reser.org>:
>>> The only thing that's really lacking here is a good way to pass along
>>> extra property values in an easy to configure way per
>>> server/repository so that you can use a client defined value to put it
>>> in svn:author.  I don't really see adding support for something like
>>> that as terribly difficult.  The only caveat I would make is that you
>>> should realize the change here is a client side change and that it'll
>>> take some time for users to upgrade clients (most distros are still
>>> shipping SVN 1.6 over a year after 1.7 released).
>>>
>>> Once you have something like that, you can expose it to the hook
>>> scripts and they can change the svn:author field to whatever the local
>>> repository prefers.  If local repositories want to store the local
>>> authenticated user in a different property they can also do this.
>>
>> Sounds like I should write that patch to make a preferred-ID string
>> available out of ~/.subversion/config, then.  As soon as possible.
>
> I wanted to reply that this should go into ~/.subversion/servers. But i  
> found the entry "username" in there. So just add a username entry to the  
> global section.

If you can have a username= in the per-server section, you probably can
have it _today_ in the [global] section too and it would take effect
(just like N other options that can be set at either global or
per-server scope)...

So you'd need to invent a new option?

Re: reposurgeon now writes Subversion repositories

Posted by Martin Furter <mf...@bluewin.ch>.
On 11/30/12 00:55, Eric S. Raymond wrote:
> Ben Reser<be...@reser.org>:
>> The only thing that's really lacking here is a good way to pass along
>> extra property values in an easy to configure way per
>> server/repository so that you can use a client defined value to put it
>> in svn:author.  I don't really see adding support for something like
>> that as terribly difficult.  The only caveat I would make is that you
>> should realize the change here is a client side change and that it'll
>> take some time for users to upgrade clients (most distros are still
>> shipping SVN 1.6 over a year after 1.7 released).
>>
>> Once you have something like that, you can expose it to the hook
>> scripts and they can change the svn:author field to whatever the local
>> repository prefers.  If local repositories want to store the local
>> authenticated user in a different property they can also do this.
>
> Sounds like I should write that patch to make a preferred-ID string
> available out of ~/.subversion/config, then.  As soon as possible.

I wanted to reply that this should go into ~/.subversion/servers. But i 
found the entry "username" in there. So just add a username entry to the 
global section.

HTH,
Martin

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Ben Reser <be...@reser.org>:
> The only thing that's really lacking here is a good way to pass along
> extra property values in an easy to configure way per
> server/repository so that you can use a client defined value to put it
> in svn:author.  I don't really see adding support for something like
> that as terribly difficult.  The only caveat I would make is that you
> should realize the change here is a client side change and that it'll
> take some time for users to upgrade clients (most distros are still
> shipping SVN 1.6 over a year after 1.7 released).
> 
> Once you have something like that, you can expose it to the hook
> scripts and they can change the svn:author field to whatever the local
> repository prefers.  If local repositories want to store the local
> authenticated user in a different property they can also do this.

Sounds like I should write that patch to make a preferred-ID string
available out of ~/.subversion/config, then.  As soon as possible.
 
> Given your goal that users shouldn't notice I'm going to assume you're
> allowing anonymous commits.

Actually that wasn't in my plan.  It's sufficient that every commit 
get an Internet-scoped ID, anonymity isn't required.  

My plan about "users don't notice" works like this.  Multiple instances
of my forge (the design's name is "Federation") each have lists of peer
instances. They flood project-index updates to each other.  When you do
a transaction about project foo with instance bar, it looks in the
index and transparently proxies for wherever the project actually is.

Another consequence of this design is that you can back up your project
state yourself by asking the forge federation to send you the same blob 
it would pass around if a peer said "Aaargh! I'm about to die!".

With a little work, the federated instances could set up a rolling-backup
scheme that would protect pretty well against unplanned server deaths.

I designed this after Berlios told the world it was going down, and
potentially taking hundreds of projects with it.  It's still up, but
I don't ever again want to have to bet everything on the eternal
stability of a single forge site.

This is a serious vulnerability in the open-source infrastructure.
I want to fix it.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Ben Reser <be...@reser.org>.
On Thu, Nov 29, 2012 at 5:49 AM, Eric S. Raymond <es...@thyrsus.com> wrote:
> I say break the hell out of it.  The utility of Internet-scoped
> attributions is pretty high in a bunch of different ways (I love me
> some Ohloh statistics, there's one).  And I doubt "server
> verification" actually buys you much.  Whetever it does buy you you
> can keep by sticking the "verified" username in a property that
> auditing tools can see but users don't need to.
>
> Let me give you a major forinstance. I have been seriously thinking
> for a couple of years about writing a better software forge.  Many and
> various are the ways in which the existing ones all suck, but the
> single worst problem with them is probably that migrating project
> state between instances ranges from hard to impossible.
>
> For reasons I shouldn't neecd to explain, we really want to live in a
> world where a forge instance that's about to undergo planned shutdown
> can squirt its project states to a bunch of peers and have each one go
> seamlessly live on a new host.  If the forge federation is designed
> right, users shouldn't even have to know when this happens.
>
> So, guess why I had to cross Subversion off the list of VCSes my design
> would support?  Yes, that's right - system-local usernames in the forge
> database and VCSes are the single most severe point of adhesion.  I had
> to get rid of them entirely, just as DVCSes have.
>
> Subversion should do likewise.

But as people have already said the meaning of the svn:author field is
locally defined by the repository.  There is nothing preventing your
proposed software forge from defining that the author field is some
different type of value e.g. the users email address.

Heck the book even says that you shouldn't assume that svn:author is
even set (believe it won't be set if you allow anonymous commits):
http://svnbook.red-bean.com/en/1.7/svn.advanced.props.html

The only thing that's really lacking here is a good way to pass along
extra property values in an easy to configure way per
server/repository so that you can use a client defined value to put it
in svn:author.  I don't really see adding support for something like
that as terribly difficult.  The only caveat I would make is that you
should realize the change here is a client side change and that it'll
take some time for users to upgrade clients (most distros are still
shipping SVN 1.6 over a year after 1.7 released).

Once you have something like that, you can expose it to the hook
scripts and they can change the svn:author field to whatever the local
repository prefers.  If local repositories want to store the local
authenticated user in a different property they can also do this.

Given your goal that users shouldn't notice I'm going to assume you're
allowing anonymous commits.  Right now there's really no definition of
how svn:author behaves with anonymous commits.  So I'd say that it
would be perfectly reasonable to define such a configurable value and
default to filling svn:author with it in the case of anonymous
commits.

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Branko Čibej <br...@wandisco.com>:
> Well, I find that we don't actually spell out anywhere that svn:author
> can be pretty much any UTF-8 string. It can even contain newlines,
> although that's not recommended.

No kidding.  That's an edge case *I* surely wouldn't want to screw with.
 
> Hint: svn commit --with-revprop svn:author="Twizzle Strongpants
> <ts...@interwebs>"

Iiiiinterressssting.
 
> I'm open to suggestions, up to and including breaking that assumption,
> though obviously I'd prefer not to.

I say break the hell out of it.  The utility of Internet-scoped
attributions is pretty high in a bunch of different ways (I love me
some Ohloh statistics, there's one).  And I doubt "server
verification" actually buys you much.  Whetever it does buy you you
can keep by sticking the "verified" username in a property that
auditing tools can see but users don't need to.

Let me give you a major forinstance. I have been seriously thinking
for a couple of years about writing a better software forge.  Many and
various are the ways in which the existing ones all suck, but the
single worst problem with them is probably that migrating project
state between instances ranges from hard to impossible.

For reasons I shouldn't neecd to explain, we really want to live in a
world where a forge instance that's about to undergo planned shutdown
can squirt its project states to a bunch of peers and have each one go
seamlessly live on a new host.  If the forge federation is designed
right, users shouldn't even have to know when this happens.

So, guess why I had to cross Subversion off the list of VCSes my design
would support?  Yes, that's right - system-local usernames in the forge
database and VCSes are the single most severe point of adhesion.  I had
to get rid of them entirely, just as DVCSes have.

Subversion should do likewise.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 29.11.2012 12:46, Eric S. Raymond wrote:
> Daniel Shahaf <da...@elego.de>:
>>> Subversion's metadata doesn't have separate author and committer
>>> properties, and doesn't store anything but a Unix user ID as
>>> attribution.  I don't see any way around this.
>> You're not fully informed, then.
>>
>> 1) svn:author revprops can contain any UTF-8 string.  They are not
>> restricted to Unix user id's.  (For example, they can contain full
>> names, if the administrator so chooses.)
> Right.  At one point during the development of this feature I was
> accidentally storing the full email field in this property.  So I
> already knew that this is allowed at some level.  
>
> And, I have no trouble believing that svn log will cheerfully echo
> anything that I choose to stuff in that field.  
>
> But...
>
> (1) How much work would it be it to set up a Subversion installation 
> so that when I svn commit, the tool does the right thing, e.g. puts
> a DVCS-style fullname/email string in there?

I don't know how common that practice is, but I've worked on a project
where svn:author was filled in from the DN and e-mail attributes of an
X-509 certificate. It's also quite easy to set svn:author from
information stored in LDAP (that is, if you find anything about LDAP
actually easy).

> (2) Have the tools been tested for bugs arising from having whitespace
> in that data?

Which tools? If you mean Subversion libs and command-line client, then yes.

> Really, if it's actually easy to set up DVCS-style globally unique IDs you
> Subversion guys ought to be shouting it from the housetops.  The absence
> of this capability is a serious PITA in several situations, including 
> for example migrating projects between forges.

Well, I find that we don't actually spell out anywhere that svn:author
can be pretty much any UTF-8 string. It can even contain newlines,
although that's not recommended.

> RFC: If I wrote a patch that let Subversion users set their own
> content string for the author field in ~/.subversion/config, would
> you merge it?  Because I'd totally write that.

Hint: svn commit --with-revprop svn:author="Twizzle Strongpants
<ts...@interwebs>"

I personally wouldn't mind if that were a user preference in the config
file. It'd have to be a per-server config option, however; and even
better, per-repository, which is a concept that the Subversion config
file does not currently support. (There's a reason why I put my ID into
.git/config, not ~/.gitconfig.)

Note that it's up to the server administrator to actually allow clients
to set svn:author (and any other revision property). The assumed, and
most common, configuration is that the server derives svn:author from
authentication information.

[...]

>> You might also seek community consensus to reserve an svn:foo name for
>> the "original author" property --- perhaps svn:original-author --- so
>> that reposurgeon and other git->svn tools can interoperate in the way
>> they transfer the "original author" information.
> OK.  But I like the idea of letting the users set their own author
> content string better.  Instead of another layer of kluges, why
> shouldn't Subversion join the DVCSes in the happy land of
> Internet-scoped attributions?

This discussion has come up before. Today, the assumption that
svn:author is something that the server has verified (modulo admins'
shenanigans) is pretty much cast in concrete.

I'm open to suggestions, up to and including breaking that assumption,
though obviously I'd prefer not to.

-- Brane


-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Daniel Shahaf <da...@elego.de>:
> I don't see the kludge here --- git has a "author" != "committer"
> distinction, svn doesn't, so if you want to grow that distinction the
> most natural way is a new property.  Storing additional information in
> svn:author is a separate issue.

See my advocacy to Branko of going to Internet-scoped IDs. The kludge
would be maintaining the local and Internet-scoped identifications 
as different properties and having to decide which one to key on
ad-hoc.  Nothing to do with the author/committer distinction. 
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <da...@elego.de>.
(note, other half of the thread is on dev@svn only..)

Eric S. Raymond wrote on Thu, Nov 29, 2012 at 06:46:37 -0500:
> Daniel Shahaf <da...@elego.de>:
> > You might also seek community consensus to reserve an svn:foo name for
> > the "original author" property --- perhaps svn:original-author --- so
> > that reposurgeon and other git->svn tools can interoperate in the way
> > they transfer the "original author" information.
> 
> OK.  But I like the idea of letting the users set their own author
> content string better.  Instead of another layer of kluges, why

I don't see the kludge here --- git has a "author" != "committer"
distinction, svn doesn't, so if you want to grow that distinction the
most natural way is a new property.  Storing additional information in
svn:author is a separate issue.

> > >   Subversion has a concept of "flows"; that is, named segments of
> > >   history corresponding to files or directories that are created when
> > >   the path is added, cloned when the path is copied, and deleted when
> > >   the path is deleted. This information is not preserved in import
> > >   streams or the internal representation that reposurgeon uses.  Thus,
> > >   after editing, the flow boundaries of a Subversion history may be
> > >   arbitrarily changed.
> > > 
> > > This is me being obsessive about documenting the details.  I think it
> > > is doubtful that most Subversion users even know flows exist.
> > 
> > I think you're saying that adds might turn into copies, and vice-versa.
> > That is something users would notice --- it is certainly exposed in the
> > UI --- even though node-id's are not exposed to clients.
> 
> I'm saying nobody thinks of flows when they do branch copies.  It's
> not just that users don't see node IDs, it's that no part of most users'
> mental model of how Subversion works resembles them.

I'm still not sure what you have in mind.  I note that 'svn log' and
'svn blame' cross both file copies and branch creation --- that's one
effect of "'svn cp foo bar; svn ci' causes bar to be related to foo".

> -- 
> 		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by "Eric S. Raymond" <es...@thyrsus.com>.
Daniel Shahaf <da...@elego.de>:
> > Subversion's metadata doesn't have separate author and committer
> > properties, and doesn't store anything but a Unix user ID as
> > attribution.  I don't see any way around this.
> 
> You're not fully informed, then.
> 
> 1) svn:author revprops can contain any UTF-8 string.  They are not
> restricted to Unix user id's.  (For example, they can contain full
> names, if the administrator so chooses.)

Right.  At one point during the development of this feature I was
accidentally storing the full email field in this property.  So I
already knew that this is allowed at some level.  

And, I have no trouble believing that svn log will cheerfully echo
anything that I choose to stuff in that field.  

But...

(1) How much work would it be it to set up a Subversion installation 
so that when I svn commit, the tool does the right thing, e.g. puts
a DVCS-style fullname/email string in there?  

(2) Have the tools been tested for bugs arising from having whitespace
in that data?

Really, if it's actually easy to set up DVCS-style globally unique IDs you
Subversion guys ought to be shouting it from the housetops.  The absence
of this capability is a serious PITA in several situations, including 
for example migrating projects between forges.

RFC: If I wrote a patch that let Subversion users set their own
content string for the author field in ~/.subversion/config, would
you merge it?  Because I'd totally write that.

> 2) You can define custom revision properties.  In your case, the easiest
> way would be to set an reposurgeon:author property, alongside the
> svn:author property.

Yeah, sure, I've assumed all along this wouldn't break if I tried it.
If I actually thought you guys were capable of designing a data model
with a perfectly general-looking store of key/value pairs and then
arbitrarily restricting the key set so I couldn't do that, I'd almost
have to find each and every one of you and kick your asses into next
Tuesday on account of blatant stupidity. I have no such plans :-).

But...what good does this capability do?  OK, it would assist
round-tripping back to gitspace, but while that's kind of cool I don't
see any help for a normal Subversion workflow here.
 
> You might also seek community consensus to reserve an svn:foo name for
> the "original author" property --- perhaps svn:original-author --- so
> that reposurgeon and other git->svn tools can interoperate in the way
> they transfer the "original author" information.

OK.  But I like the idea of letting the users set their own author
content string better.  Instead of another layer of kluges, why
shouldn't Subversion join the DVCSes in the happy land of
Internet-scoped attributions?

> How does reposurgeon handle empty directories with (node) properties?

Currently by ignoring all of them except svn:ignore, which it turns 
into .gitignore content on the gitspace side.  And now vice-versa, too.

Not clear what else it *could* do.  I'd take suggestions.

> >   Subversion has a concept of "flows"; that is, named segments of
> >   history corresponding to files or directories that are created when
> >   the path is added, cloned when the path is copied, and deleted when
> >   the path is deleted. This information is not preserved in import
> >   streams or the internal representation that reposurgeon uses.  Thus,
> >   after editing, the flow boundaries of a Subversion history may be
> >   arbitrarily changed.
> > 
> > This is me being obsessive about documenting the details.  I think it
> > is doubtful that most Subversion users even know flows exist.
> 
> I think you're saying that adds might turn into copies, and vice-versa.
> That is something users would notice --- it is certainly exposed in the
> UI --- even though node-id's are not exposed to clients.

I'm saying nobody thinks of flows when they do branch copies.  It's
not just that users don't see node IDs, it's that no part of most users'
mental model of how Subversion works resembles them.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: reposurgeon now writes Subversion repositories

Posted by Branko Čibej <br...@wandisco.com>.
On 29.11.2012 08:58, Daniel Shahaf wrote:
> I think you're saying that adds might turn into copies, and
> vice-versa. That is something users would notice --- it is certainly
> exposed in the UI --- even though node-id's are not exposed to clients. 

... yet. But there are plans underway to expose them.

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: reposurgeon now writes Subversion repositories

Posted by Daniel Shahaf <da...@elego.de>.
Eric S. Raymond wrote on Thu, Nov 29, 2012 at 00:59:45 -0500:
>   In summary, Subversion repository histories do not round-trip through
>   reposurgeon editing. File content changes are preserved but some
>   metadata is unavoidably lost.  Furthermore, writing out a DVCS history
>   in Subversion also loses significant portions of its metadata.
> 
>   Writing a Subversion repository or dump stream discards author
>   information, the committer's name, and the hostname part of the commit
>   address; only the commit timestamp and the local part of the
>   committer's email address are preserved, the latter becoming the
>   Subversion author field.  However, reading a Subversion repository and
>   writing it out again will preserve the author fields.
> 
> Subversion's metadata doesn't have separate author and committer
> properties, and doesn't store anything but a Unix user ID as
> attribution.  I don't see any way around this.

You're not fully informed, then.

1) svn:author revprops can contain any UTF-8 string.  They are not
restricted to Unix user id's.  (For example, they can contain full
names, if the administrator so chooses.)

2) You can define custom revision properties.  In your case, the easiest
way would be to set an reposurgeon:author property, alongside the
svn:author property.

You might also seek community consensus to reserve an svn:foo name for
the "original author" property --- perhaps svn:original-author --- so
that reposurgeon and other git->svn tools can interoperate in the way
they transfer the "original author" information.

I note that one can set revision properties at commit time:

    svn commit -m logmsg --with-revprop svn:original-author="Patch Submitter <fo...@bar.example>"

>   Empty directories aren't represented in import streams. Consequently,
>   reading and writing Subversion repositories preserves file content,
>   but not empty directories.  It is also not guaranteed that after
>   editing a Subverson repository that the sequence of directory
>   creations and deletions relative to other operations will be
>   identical; the only guarantee is that enclosing directories will be
>   created before any files in them are.

How does reposurgeon handle empty directories with (node) properties?

% svnadmin create r
% svnmucc -mm -U file://$PWD/r mkdir foo propset k v foo

>   Subversion has a concept of "flows"; that is, named segments of
>   history corresponding to files or directories that are created when
>   the path is added, cloned when the path is copied, and deleted when
>   the path is deleted. This information is not preserved in import
>   streams or the internal representation that reposurgeon uses.  Thus,
>   after editing, the flow boundaries of a Subversion history may be
>   arbitrarily changed.
> 
> This is me being obsessive about documenting the details.  I think it
> is doubtful that most Subversion users even know flows exist.
> 

I think you're saying that adds might turn into copies, and vice-versa.
That is something users would notice --- it is certainly exposed in the
UI --- even though node-id's are not exposed to clients.

> 

Cheers

Daniel