You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2002/12/11 22:43:24 UTC

repository GUIDs

mbk says he wants to take on the repository GUID issue.

My question to the list is: should the GUID be attached to the
filesystem (accessed via libsvn_fs), as an unversioned rev 0 property?
Or should it be general repository identifier (accessed via
libsvn_repos), sitting in normal filespace (like, next to the 'format'
file)?

I can't think of any obvious advantages or disadvantages one way or
the other.  But I'm posting the question to the list.  Maybe I'm
missing something obvious...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by mark benedetto king <bk...@answerfriend.com>.
On Wed, Dec 11, 2002 at 05:38:32PM -0800, Bill Tutt wrote:
> 
> > From: Branko Cibej [mailto:brane@xbc.nu]
> > Yes. Changing a unique ID is always a bad idea. Ben, d'you think you
> can
> > make that one property read-only? I'm afraid it probably means adding
> a
> > rudimentary access-control mechanism for revision props.
> 
> While Brane makes a great comment about ACLs and revision properties,
> I'd like to point out at the filesystem level that the best place to
> stick this information is in a new BDB table. Even if you expose the new
> table through the ra layers as a revision property, we're still further
> ahead of the game if we add a new BDB table for this. 
> 
> The new table is very simple. All it needs at the moment are two
> columns:
> RespositoryID and GUID. RepositoryID is just another one of our fun
> monotonically increasing ID fields, and the GUID column is the
> repository GUID. The reason for structuring the data this way is that
> eventually we'll want to widen at least the NodeRevision primary key by
> RepositoryID. We don't want to widen the NodeRevision PK by the
> repository GUID mainly because GUIDs cluster so poorly on indices. No
> need to waste valuable page &/or index space.
> 
> FYI,
> Bill

So, I think there are two decisions:

1.) where to put it.
    a.) a distinguished file
    b.) a distinguished revision property
    c.) an fs table

2.) how to get it.
    a.) new ra function
    b.) through existing ra revision props interface

(1a + 2a) is simple and easy to implement, and strikes me
          as unlikely to break anything.

(1a + 2b) will require hacking up the rev-prop codepath
          to catch access to svn:uuid on rev 0 and redirect
          access to the file.

(1b + 2a) requires extending the ra vtable, which will touch
          a lot of things, just to implement a read-only
          property.  Of all the combinations, this one strikes
          me as the least attractive.

(1b + 2b) will require hacking up the rev-propset codepath
          to deny writes to svn:uuid on rev 0.  This is probably
          the "least code required" option.

(1c + 2a)
(1c + 2b) These are roughly equivalent to those with 1a, with
          the following exceptions:
            1.) it will be more work
            2.) this work is good work, in that moves us
                towards a data model that will support
                distributed repositories.


My personal leaning is towards (1c + 2a).


--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Branko Čibej <br...@xbc.nu>.
Bill Tutt wrote:

>>From: Branko Cibej [mailto:brane@xbc.nu]
>>Yes. Changing a unique ID is always a bad idea. Ben, d'you think you
>>    
>>
>can
>  
>
>>make that one property read-only? I'm afraid it probably means adding
>>    
>>
>a
>  
>
>>rudimentary access-control mechanism for revision props.
>>    
>>
>
>While Brane makes a great comment about ACLs and revision properties,
>I'd like to point out at the filesystem level that the best place to
>stick this information is in a new BDB table. Even if you expose the new
>table through the ra layers as a revision property, we're still further
>ahead of the game if we add a new BDB table for this. 
>
>The new table is very simple. All it needs at the moment are two
>columns:
>RespositoryID and GUID. RepositoryID is just another one of our fun
>monotonically increasing ID fields, and the GUID column is the
>repository GUID. The reason for structuring the data this way is that
>eventually we'll want to widen at least the NodeRevision primary key by
>RepositoryID. We don't want to widen the NodeRevision PK by the
>repository GUID mainly because GUIDs cluster so poorly on indices. No
>need to waste valuable page &/or index space.
>  
>

I find myself agreeing with Bill yet again. :-)
But perhaps we can work out a compromise: _this_ repository's GUID is a
special case, and we can always map it (later) to RepositoyID 0. Which
means that we don't actually need a new BDB table at this very moment --
since we won't be widening the NodeRevision PK now -- but we _do_ need a
way to store the GUID and read it.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Bill Tutt" <ra...@lyra.org> writes:
> The new table is very simple. All it needs at the moment are two
> columns:
> RespositoryID and GUID. RepositoryID is just another one of our fun
> monotonically increasing ID fields, and the GUID column is the
> repository GUID. The reason for structuring the data this way is that
> eventually we'll want to widen at least the NodeRevision primary key by
> RepositoryID. We don't want to widen the NodeRevision PK by the
> repository GUID mainly because GUIDs cluster so poorly on indices. No
> need to waste valuable page &/or index space.

What's the purpose of RepositoryID (as separate from GUID)?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: repository GUIDs

Posted by Bill Tutt <ra...@lyra.org>.
> From: Branko Cibej [mailto:brane@xbc.nu]
> Yes. Changing a unique ID is always a bad idea. Ben, d'you think you
can
> make that one property read-only? I'm afraid it probably means adding
a
> rudimentary access-control mechanism for revision props.

While Brane makes a great comment about ACLs and revision properties,
I'd like to point out at the filesystem level that the best place to
stick this information is in a new BDB table. Even if you expose the new
table through the ra layers as a revision property, we're still further
ahead of the game if we add a new BDB table for this. 

The new table is very simple. All it needs at the moment are two
columns:
RespositoryID and GUID. RepositoryID is just another one of our fun
monotonically increasing ID fields, and the GUID column is the
repository GUID. The reason for structuring the data this way is that
eventually we'll want to widen at least the NodeRevision primary key by
RepositoryID. We don't want to widen the NodeRevision PK by the
repository GUID mainly because GUIDs cluster so poorly on indices. No
need to waste valuable page &/or index space.

FYI,
Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Branko Čibej <br...@xbc.nu>.
Peter Davis wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On Wednesday 11 December 2002 15:10, mark benedetto king wrote:
>  
>
>>I'm not sure exactly what you mean; relocate wouldn't need to change it,
>>just refer to it.
>>    
>>
>
>Right, but if someone were to change it themselves, it could do funny things 
>to relocate.  If it's implemented as a revision property, then someone could 
>change it accidentally (bad), or change it on purpose, say, to make one 
>repository masquerade as another by copying the GUID (possibly good or bad).
>
>Aside from that, I think it makes more sense as a revision property.  But 
>possibly it should be read-only, or else people would probably want/need to 
>tighten the access controls.
>

Yes. Changing a unique ID is always a bad idea. Ben, d'you think you can
make that one property read-only? I'm afraid it probably means adding a
rudimentary access-control mechanism for revision props.


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Peter Davis <pe...@pdavis.cx>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 11 December 2002 15:10, mark benedetto king wrote:
> I'm not sure exactly what you mean; relocate wouldn't need to change it,
> just refer to it.

Right, but if someone were to change it themselves, it could do funny things 
to relocate.  If it's implemented as a revision property, then someone could 
change it accidentally (bad), or change it on purpose, say, to make one 
repository masquerade as another by copying the GUID (possibly good or bad).

Aside from that, I think it makes more sense as a revision property.  But 
possibly it should be read-only, or else people would probably want/need to 
tighten the access controls.

- -- 
Peter Davis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE998lghDAgUT1yirARAgqRAJ4zEjtKh+vo5F89V1DhmhxQTjq+EgCghnN8
Xg0AzQgy3vWv784JXB1cELA=
=aT3f
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by mark benedetto king <bk...@answerfriend.com>.
On Wed, Dec 11, 2002 at 03:00:09PM -0800, Peter Davis wrote:
> On Wednesday 11 December 2002 14:43, Ben Collins-Sussman wrote:
> > should the GUID be attached to the
> > filesystem (accessed via libsvn_fs), as an unversioned rev 0 property?
> 
> Just wondering, what would happen if someone changed that property?  I can 
> think of reasons why it would be good to allow changing it, and reasons why 
> it would be bad.  What (besides a "relocate" command) would rely on this?

I'm not sure exactly what you mean; relocate wouldn't need to change it,
just refer to it.

Adding the UUID in the repos gives two major benefits:

    1.) it lets the client validate that, barring malevolence, the
        repository its URLs point to is the one that was there before

and

    2.) it lets clients and servers use UUID/Node/Rev triplets to
        uniquely identify the chunks of data that they exchange; this
        is an important step in building a distributed repository
        scheme.

> 
> If you store the GUID in a new file, then only someone with admin access to 
> the repository's server can change it.  People might want to tighten their 
> pre-commit access control scripts to prevent normal committers from changing 
> a revision property (either intentionally or accidentally), especially 
> because it is unversioned.

Yes.  This is a very important point, IMO.


--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Peter Davis <pe...@pdavis.cx>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 11 December 2002 14:43, Ben Collins-Sussman wrote:
> should the GUID be attached to the
> filesystem (accessed via libsvn_fs), as an unversioned rev 0 property?

Just wondering, what would happen if someone changed that property?  I can 
think of reasons why it would be good to allow changing it, and reasons why 
it would be bad.  What (besides a "relocate" command) would rely on this?

If you store the GUID in a new file, then only someone with admin access to 
the repository's server can change it.  People might want to tighten their 
pre-commit access control scripts to prevent normal committers from changing 
a revision property (either intentionally or accidentally), especially 
because it is unversioned.

- -- 
Peter Davis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQE998N9hDAgUT1yirARAhM+AJ9SFqovAN8XwGm9bHjgxX4UV/gVNgCgjMNr
qT9/r0SS5jjM5wU4EOYbhyE=
=WJZA
-----END PGP SIGNATURE-----


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:
> If it's a rev 0 property, then it's available to clients through an
> existing interface.  If it's a separate file, then we'll need a new
> interface.  This seems like the kind of thing properties were created
> for.

Ignore my post just now about how libsvn_repos is the right place --
Greg's got a good point :-).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by cm...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:

> On Wed, 2002-12-11 at 17:43, Ben Collins-Sussman wrote:
> > I can't think of any obvious advantages or disadvantages one way or
> > the other.  But I'm posting the question to the list.  Maybe I'm
> > missing something obvious...
> 
> If it's a rev 0 property, then it's available to clients through an
> existing interface.  If it's a separate file, then we'll need a new
> interface.  This seems like the kind of thing properties were created
> for.

Of course, it also means it's editable unless some magic code exists
to say it's not.  I dunno if that is good or bad--it just is.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by mark benedetto king <bk...@answerfriend.com>.
On Wed, Dec 11, 2002 at 05:48:13PM -0500, Greg Hudson wrote:
> On Wed, 2002-12-11 at 17:43, Ben Collins-Sussman wrote:
> > I can't think of any obvious advantages or disadvantages one way or
> > the other.  But I'm posting the question to the list.  Maybe I'm
> > missing something obvious...
> 
> If it's a rev 0 property, then it's available to clients through an
> existing interface.  If it's a separate file, then we'll need a new
> interface.  This seems like the kind of thing properties were created
> for.
> 

I think it's reasonable to hang something like this off of
rev 0 properties, and it will be less work to do it this way
than ripple the changes through all the RA implementations.

Moving past where the UUIDs are stored in the repository,
should they be stored in the WC?  I think so, since 
svn switch --relocate needs to work when the original URL
is no longer valid.

It probably makes sense, then, to store the UUID associated
with every repository reference; i.e. every URL in the entries-file.
I think this will be necessary and sufficient.  Am I overlooking
anything?

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2002-12-11 at 17:43, Ben Collins-Sussman wrote:
> I can't think of any obvious advantages or disadvantages one way or
> the other.  But I'm posting the question to the list.  Maybe I'm
> missing something obvious...

If it's a rev 0 property, then it's available to clients through an
existing interface.  If it's a separate file, then we'll need a new
interface.  This seems like the kind of thing properties were created
for.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by cm...@collab.net.
Karl Fogel <kf...@newton.ch.collab.net> writes:

> Ben Collins-Sussman <su...@collab.net> writes:
> > mbk says he wants to take on the repository GUID issue.
> > 
> > My question to the list is: should the GUID be attached to the
> > filesystem (accessed via libsvn_fs), as an unversioned rev 0 property?
> > Or should it be general repository identifier (accessed via
> > libsvn_repos), sitting in normal filespace (like, next to the 'format'
> > file)?
> > 
> > I can't think of any obvious advantages or disadvantages one way or
> > the other.  But I'm posting the question to the list.  Maybe I'm
> > missing something obvious...
> 
> Our general principle has been, keep it out of the fs if it doesn't
> have to be part of the fs.  So I lean toward libsvn_repos (next to the
> format file seems like a good idea).

+1.

Should we just a new file in /path/to/repos/id that contains the GUID?
(It would be a sibling to the 'format' file)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: repository GUIDs

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Ben Collins-Sussman <su...@collab.net> writes:
> mbk says he wants to take on the repository GUID issue.
> 
> My question to the list is: should the GUID be attached to the
> filesystem (accessed via libsvn_fs), as an unversioned rev 0 property?
> Or should it be general repository identifier (accessed via
> libsvn_repos), sitting in normal filespace (like, next to the 'format'
> file)?
> 
> I can't think of any obvious advantages or disadvantages one way or
> the other.  But I'm posting the question to the list.  Maybe I'm
> missing something obvious...

Our general principle has been, keep it out of the fs if it doesn't
have to be part of the fs.  So I lean toward libsvn_repos (next to the
format file seems like a good idea).

That also makes it more transparent for repository administrators.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org