You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Bruce Atherton <br...@hagbard.flair.law.ubc.ca> on 2001/02/27 02:37:50 UTC

A few ideas concerning Subversion

Eric's message concerning the lack of discussion around architecture has 
prompted me to post concerning some ideas that I think are important for 
Subversion. I've been holding off on posting these ideas because I didn't 
want to interrupt the flow of work that has been going on, but I think he 
is right in that there are a few things that need to at least be reviewed 
while the work continues.

I apologize to everyone for hashing over ideas that may have been done to 
death, but I am a little late for the party and missed much of what has 
gone before. If nothing else, perhaps these will point the way to some 
entries in a FAQ.

WARNING: The following is lengthy. Proceed at your own risk.

CONSIDERATIONS FOR DESIGN OF SUBVERSION

a) Adding a CVS proxy

There have been many source code management tools that have come and gone. 
We know that Subversion will eventually catch on based on its WebDAV/DeltaV 
interface, CVS feature set, and integration with Apache, but in the 
changeover time there is a chicken-and-egg problem.

The problem is not on the server side. Anyone that knows how to run a CVS 
server probably also knows how to drop a module into Apache, and with the 
conversion scripts the process, although one way, should be simple. The 
only reticence that server operators are going to show is in their loss of 
all the people with CVS clients.

CVS clients have finally become somewhat ubiquitous. Most IDEs provide at 
least some support for CVS built in, some of them quite sophisticated. We 
are finally at the point that SCC has been at for years. Yes, the client 
library will simplify porting new clients into new environments, but it 
still must be admitted that this process will take some time.

My idea is to write a new server that does nothing but receive CVS pserver 
protocol requests from a client and translates them into an SVN request, 
and translates the response back to the client. In Design Pattern terms, a 
Mediator. Of course, the two models are fundamentally different and only a 
fool would think they are going to work perfectly together, but consider 
this. The number one use of CVS on the public Internet is to checkout a 
copy of the source code of some project anonymously. If we could support 
just this one feature, we would be supporting 80% (93.7% of all stats are 
made up) of the actions of users on the Internet. And I think we could go 
much farther than that.

Perhaps the CVS proxy would have to store local data that was eventually 
tossed out. Perhaps it would create SVN hierarchies as regular files for 
the CVS client. Perhaps the Subversion server would be forced to put in 
place certain types of properties in aid of the translation to CVS. In any 
case, we should be able to accomplish some rough equivalence that is 
suitable for some percentage of users.

So how to create this Proxy? I can think of several ways:

   - Existing CVS source code could be linked into the new libraries (ugh!)

   - Apache 2.0 is supposed to have better support for implementing new 
protocols, so in theory we could write a MOD_PSERVER. In practice, I've 
looked at the one non-HTTP based protocol they support (mod_echo) and it is 
unclear to me how it would be done. Perhaps it was easy, but the 
documentation wasn't there for me to know.

   - We could use Avalon. It is written in Java, though, so there would be 
an issue with how the Subversion libraries would be supported.
     * Use JNI to get at the libraries
     * rewrite the libraries in Java
     * Translate the CVS pserver request into a WebDAV/DeltaV request, and 
reissue it to the URL of your choice. This seems the most generic solution.

There is another possibility here if we were to support commits. Since 
branches in Subversion are such lightweight options, clients coming in 
through an anonymous CVS proxy could automatically be redirected by the 
proxy to a branch, perhaps one based on their email address. That way, 
changes could go into the repository and anyone, at any time, could review 
them at their leisure without worrying about anything getting screwed up. 
That may be too insecure for some folk, I don't know.

So what do people think about that idea? Is the "marketing" slant correct? 
Are there other ways of accomplishing the same task that I haven't 
considered? Are the ones I have considered easier or harder than I think 
they are?

I'm thinking that developing this could be my initial contribution to 
Subversion, so I'd appreciate any points of view out there.

The path to Subversion is a slippery slope (or it should be).

b) Drop in replacement for CVS

There are many sysadmins with fairly sophisticated scripts for integrating 
the CVS server into their code management and release strategy. It would be 
awfully nice if Subversion knew how to interpret all of the standard files 
in the CVSROOT directory and just worked with them. Perhaps this could be a 
Subversion add-on module in the Apache style.

c) Linking to Zope

Zope is a great application framework that already supports WebDAV. It is 
written in Python which has an easy mechanism for calling into C code, and 
it already supports many different types of data access. It's own object 
database is versioned. This suggests to me that asking the Digital 
Creations people to support DeltaV and providing an alternative backend for 
Zope could result in some quick wins for everyone, once the Subversion file 
system becomes a little more stable.

d) Wiki

At this stage in Subversion's life, it looks to me like it would be 
beneficial to offer all of the documentation up on the web in some forum 
that encourages collaborative editing, such as a Wiki. Periodically, a 
script could suck the material off of the Wiki and check it in (or the Wiki 
could use Subversion for its file store).

We should all make a concerted effort to keep any documentation on 
Subversion as up-to-date as possible. The documentation that exists now is 
really lacking, but with a communal editing effort could be kept in sync 
with the code.

e) Mount points in the file system

I have read in a recent message that sub-repositories were argued to death 
and summarily rejected, but I'm at a bit of a loss to understand why. It 
doesn't SEEM to have much impact on the file structure (though I am 
probably missing something), and the existence of the APR, NEON, and 
EXPAT-LITE subrepositories within this repository certainly show its 
desirability.

Imagine an entity that is not a file or a directory, but a mount. It points 
(with a URL) to a particular point within another repository, and includes 
the full revision information so that, no matter what is done to the remote 
subrepository, the revision information in our local copy identifies the 
same state of the remote repository tree.

Users checking out your repository are redirected, when they come to the 
mount link, to get the subrepository information from the other repository, 
using the original revision information. Files checked out through the 
remote repository would be read-only and marked as non-commitable. That is, 
the tree of files from the remote repository would be treated in the same 
way that files checked out with a fixed, non-branch tag are in CVS. No-one 
could check in anything into a the subrepository.

Updates to a mount point would happen only through a special administrative 
command, to keep the library you are relying on stable until you say you 
want to update to a later version.

Now, I can see that there might be authentication issues here, but I would 
think that they could be worked around (to the point of saying it just 
doesn't work unless there is coordination between the repository owners). 
This is simply too important an issue to be flushed without making 
extremely sure it would be more hassle than it is worth, in my opinion.

f) Staying flexible enough for alternate uses

I am a little leary about forcing SVN clients to implement a certain 
behaviour. It can shut down uses on you if you aren't careful.

For example, consider a client that runs every night and backs up all 
changes on my hard drive to a SVN server. That way, I can at any time 
upgrade my hardware and restore my new machine to the same state as my old 
machine in a relatively short time. In this scenario, I really don't want a 
mirror of my hard drive littered across many SVN directories. My tradeoffs 
here are much different from the standard source code control system. I am 
probably not restoring that often, and don't mind a major speed hit when it 
happens. I probably want to keep my own custom table of file sizes and 
dates to speed the backup, though, along with perhaps a checksum.

Another example could be a Linux file system implemented on top of the SVN 
file system. New tools could allow you to switch revisions of a file or 
tree, or switch to another branch. Every time you closed a file, it would 
be checked for whether it needed to be checked in. In fact, one could 
imagine a Redhat distribution and a Debian distribution on the same system. 
If properly done, it could understand that files such as /etc/smb.conf on a 
RedHat system are the same as /etc/samba/smb.conf on a Debian system, and 
even allow you to keep the two branches in sync for certain files. Such a 
file system would be transactioned, versioned, and property based (allowing 
maximum flexibility in ACL control). And slow. But for some people, it 
might be the right tradeoff.

My last example would be an NTFS-based file system that kept a remote SVN 
server in sync. Imagine periodically deleting all of your files from your 
hard drive. Any attempt to open a file would first check for a local copy, 
and failing that perform a check  out from the SVN server before opening 
it. This would allow you to clean out all the gunk in your hard drive while 
remaining confident that the material was still readily available to you 
(with a slight initial delay for each new file). Of course, you couldn't 
run a Virus Checker on the client but otherwise it could allow you to delay 
upgrading your hard drive for quite a long time.

Ok, just a few pie-in-the-sky ideas but hopefully you get the point. There 
are times that storing too much information client-side is a really bad 
trade-off for the application. While Subversion is initially intended for 
source code control only, is it possible to keep our options open?

g) Modular form for Subversion server

One of the great strengths of Apache is in its modular form, with support 
for adding new modules fairly simply. So what is the plan for modular 
support in Subversion? Subversion is already a mod of Apache, but shouldn't 
it support its own plug-in API specifically for adding on features to 
source code management? I'm thinking of different LOD models, approval 
systems, release states, other tools, etc.

Integrating scripting languages would of course be an important step. 
Perhaps Subversion mods that themselves required Apache mod_perl or 
mod_python to be installed, and that extended those mods to supply data 
structures specific to Subversion.

I once wrote (http://groups.yahoo.com/group/info-cvs/message/15036) a 
series of steps for a CVS server that mimiced the steps of the Apache web 
server. I've excerpted the relevant part here:

 > Here is an example of the steps the CVS server might go through, and that a
 > module could register itself for. Of course, some commands would only go
 > through a subset of the steps:
 >
 >    a) Authorize user (pserver, kerberos, GSSAPI, etc)
 >    b) Identify command (standard commands, grouped commands, add your own)
 >    c) Translate namespace to source namespace, destination namespace (& 
filter)
 >    d) Verify user can access source &/or destination namespaces
 >    e) For each file/directory
 >       1) Retrieve from source
 >       2) Determine file properties (mime type, binary, revision, tags, etc)
 >       3) Translate file (keywords, promotion level, line-ends, diff, etc)
 >       4) Apply command
 >       5) Generate server-side files (CVSRoot, CVS directory)
 >       6) Send results to Destination
 >       7) Log any file-specific info
 >    f) Log command info

Of course, Apache does many of these things for us, and the design of 
Subversion undoubtably makes others irrelevant, but perhaps consideration 
could be given as to which steps are still relevant, and how modules could 
hook into them.

h) Rewriting libs in other languages

Once the API has stabilized, it is important that the ability to access the 
Subversion libraries be available in as many languages as possible. These 
tasks are probably outside the effort on Subversion itself, but hopefully 
volunteers from the other communities can be found to make that effort. For 
a tool like this that will require clients in all kinds of environments, 
language neutrality is a very important feature to strive for from the 
beginning, I think.

i) Server controlled UI for conforming clients

It would be awfully nice if, right from the beginning, support was provided 
for servers to have some control over the user interfaces of the clients. 
This way, as each group customized their source code management system to 
support their particular view of the world, no matter how controlled or how 
lax, the clients could automatically adapt to provide the right user 
interface. This might mean that people filling different roles would see 
completely different user interfaces.

The current standard way to do this kind of server-controlled user 
interface right now is through HTML Forms, but of course they are not 
nearly rich enough to provide a complete user interface. I see three 
possible contenders, each with their own flaws:

     1) XUL - seems to be oriented almost exclusively at web browsers. 
Still once, DeltaV becomes integrated into Mozilla XUL may allow Mozilla to 
become a versioning client.

     2) UIML - the most general purpose solution, but it seems to be so 
abstract as to provide no actual user interface elements that are usable at 
this time. Perhaps that will change over time.

     3) XForms - this is the option that seems to show the most promise, 
although the standard is still developing.

Anyway, those are my thoughts for now. I'd appreciate hearing from anyone 
who has any reaction at all to what I've written, even if it a belief that 
I should just go away with these high-faluting ideas for now while the 
serious work is done. I'd particularly like to hear from someone who has 
information or ideas about the CVS proxy idea. Thanks.

Re: A few ideas concerning Subversion

Posted by Jim Blandy <ji...@zwingli.cygnus.com>.

I think the current filesystem interface should provide a nice basis
for a reimplementation of CVS pserver.  Mananging the mismatches
between the models will be a challenge, but once you've decided how to
manage the correspondence, libsvn_fs shouldn't get in your way.

Re: A few ideas concerning Subversion

Posted by Greg Stein <gs...@lyra.org>.

On Mon, Feb 26, 2001 at 06:37:50PM -0800, Bruce Atherton wrote:
> Eric's message concerning the lack of discussion around architecture has 
> prompted me to post concerning some ideas that I think are important for 
> Subversion.

Well, that's because most of the architecture is somewhat settled, and we're
now executing on it :-)  Some of the discussion has also happened a bit in
person or on the phone. But I think most of it is kind of a "group gestalt"
and we're just coding it now.

>...
> a) Adding a CVS proxy

[ note: CVS client talking to SVN server ]

>...
> CVS clients have finally become somewhat ubiquitous. Most IDEs provide at 
> least some support for CVS built in, some of them quite sophisticated. We 
> are finally at the point that SCC has been at for years. Yes, the client 
> library will simplify porting new clients into new environments, but it 
> still must be admitted that this process will take some time.

Agreed. However, I don't think anybody has volunteered to write this yet
(nor have we listed it as a 1.0 requirement). But as GregH wrote: we
wouldn't refuse it either :-)

>...
> So how to create this Proxy? I can think of several ways:
> 
>    - Existing CVS source code could be linked into the new libraries (ugh!)

Considering the CVS codebase fragility, this may be a non-starter. I'd hate
to imagine CVS pserver built against libsvn_fs.

>    - Apache 2.0 is supposed to have better support for implementing new 
> protocols, so in theory we could write a MOD_PSERVER. In practice, I've 
> looked at the one non-HTTP based protocol they support (mod_echo) and it is 
> unclear to me how it would be done. Perhaps it was easy, but the 
> documentation wasn't there for me to know.

Quite doable, and Ryan Bloom is actually doing some work this week to
improve the multiprotocol support. While doc will certainly be lacking for a
while, this will be entirely doable.
(it's doable now, but somewhat difficult; Ryan is untangling code so the
 non-HTTP-specific support/framework can be used)

>    - We could use Avalon. It is written in Java, though, so there would be 
> an issue with how the Subversion libraries would be supported.
>      * Use JNI to get at the libraries
>      * rewrite the libraries in Java
>      * Translate the CVS pserver request into a WebDAV/DeltaV request, and 
> reissue it to the URL of your choice. This seems the most generic solution.

Seems doable. I'd go with the Apache approach over Avalon :-)

There is a simpler approach, which is the style used by CVS pserver itself.
Use the inetd network daemon system. Basically, your program is started with
a network socket hooked to stdin/stdout.

Below, you mention language APIs. Assume you have a Perl or Python API to
libsvn_fs. Write a script to suck in pserver requests from stdin, process
against libsvn_fs, then write the results to stdout.

Very clean, very easy to test, and because it is script-based, very quick to
write/test/debug/change. I'd take this approach over all others. Recognize
that we don't need the network processing speed of something like Apache. We
already gain the working set efficiency of libsvn_fs over the CVS datastore.
So you're all set.

> There is another possibility here if we were to support commits. Since

I can't imagine a way to unify the two repository models well enough to
support this.

>...
> b) Drop in replacement for CVS
> 
> There are many sysadmins with fairly sophisticated scripts for integrating 
> the CVS server into their code management and release strategy. It would be 
> awfully nice if Subversion knew how to interpret all of the standard files 
> in the CVSROOT directory and just worked with them. Perhaps this could be a 
> Subversion add-on module in the Apache style.

Interesting point.

We've already considered (and somebody started) a "cvs" command line written
in Perl which just maps everything to "svn" commands.

> c) Linking to Zope
> 
> Zope is a great application framework that already supports WebDAV. It is

Actually, it doesn't have very good support, last I checked. And it is very
far from DeltaV.

> written in Python which has an easy mechanism for calling into C code, and 
> it already supports many different types of data access. It's own object 
> database is versioned. This suggests to me that asking the Digital 
> Creations people to support DeltaV and providing an alternative backend for 
> Zope could result in some quick wins for everyone, once the Subversion file 
> system becomes a little more stable.

I've talked with Paul Everitt (their CEO) about stuff along these lines.
They're certainly thinking about it, and *especially* w.r.t Subversion. Not
sure what he wants private/public, so I'll leave it there. There are
certainly a number of interesting ways to slice this.

Again: this would be a separate effort. Subversion itself will never need or
require Zope.

> d) Wiki

Sounds like CollabNet may be introducing something like this within Tigris.

>...
> We should all make a concerted effort to keep any documentation on 
> Subversion as up-to-date as possible. The documentation that exists now is 
> really lacking, but with a communal editing effort could be kept in sync 
> with the code.

I'll write a disclaimer here, because I know most people don't feel this
way. My opinion :-) ...

I don't need the doc to get my coding done. I'm familiar with all the pieces
that I need to interact with, and then some. I know our end target. All that
is missing is getting the code done. Updating the doc is merely a
distraction. Periodically, I update webdav-usage.html, but that is mostly to
clarify/review my own thoughts.

We aren't at a stage where the code is usable, so doc certainly isn't needed
for users. It might be nice for developers, but I'm not too worried about
that either. We've seen plenty of people come on board with the current doc.
The project started with *four* of us: Karl, Ben, Jim, and myself. All the
rest are "new"; I'd say that we can get additional people even with the
current state of the doc.

</end-of-personal-thoughts :-)>

> e) Mount points in the file system
> 
> I have read in a recent message that sub-repositories were argued to death 
> and summarily rejected, but I'm at a bit of a loss to understand why. It

They weren't rejected. We'll support them, even in 1.0. You may have misread
a "punt for now" as "punt till 2.0".

>...
> f) Staying flexible enough for alternate uses

This isn't very "actionable", so I can't comment.

>...
> trade-off for the application. While Subversion is initially intended for 
> source code control only, is it possible to keep our options open?

Not true. SVN will be usable for any source control needs. I've already beat
Jim up to ensure that we have a way to store and retrieve a gigabyte-sized
file, without loading the whole bugger into memory. We've got some thinking
along those lines, but we'll know more when we sit down to code. I'm
certainly keeping it in mind.

> g) Modular form for Subversion server

Future version. We're going to ship something that can replace CVS. Then
we'll leave it in the dust with kick-ass new features.

>...
> h) Rewriting libs in other languages
> 
> Once the API has stabilized, it is important that the ability to access the

Already planned. I'll personally be ensuring there is a Python library.
Lefty said he wanted to work on it, so I'll probably just be assisting
and/or mentoring him with that.

>...
> i) Server controlled UI for conforming clients

Not sure on the utility of this.

>...
> Anyway, those are my thoughts for now. I'd appreciate hearing from anyone 
> who has any reaction at all to what I've written, even if it a belief that 
> I should just go away with these high-faluting ideas for now while the 
> serious work is done.

Ideas are fine, but recognize that we already have an end goal in mind with
a limited/targeted feature set. In that sense, about the best we can do is
to take your ideas and drop them into the IDEAS file. Submitting ideas as a
patch against the file is a sure bet :-)

> I'd particularly like to hear from someone who has 
> information or ideas about the CVS proxy idea. Thanks.

Go with the inetd approach. Part of your building block will be your
favorite language API. The API will be a great addition regardless of
where/how the CVS proxy ends up (e.g. in terms of feature richness).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: A few ideas concerning Subversion

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.

On Mon, Feb 26, 2001 at 06:37:50PM -0800, Bruce Atherton wrote:
> i) Server controlled UI for conforming clients
> 
> It would be awfully nice if, right from the beginning, support was provided 
> for servers to have some control over the user interfaces of the clients. 
> This way, as each group customized their source code management system to 
> support their particular view of the world, no matter how controlled or how 
> lax, the clients could automatically adapt to provide the right user 
> interface. This might mean that people filling different roles would see 
> completely different user interfaces.
> 
> The current standard way to do this kind of server-controlled user 
> interface right now is through HTML Forms, but of course they are not 
> nearly rich enough to provide a complete user interface. I see three 
> possible contenders, each with their own flaws:
> 
>      1) XUL - seems to be oriented almost exclusively at web browsers. 
> Still once, DeltaV becomes integrated into Mozilla XUL may allow Mozilla to 
> become a versioning client.
> 
>      2) UIML - the most general purpose solution, but it seems to be so 
> abstract as to provide no actual user interface elements that are usable at 
> this time. Perhaps that will change over time.

Take it from someone who has worked with UIML (see
kuimlrenderer.sourceforge.net), UIML is not supported well enough yet, and the
uiml.org group seems to to implement the ones they have.  I once thought it a
very promising tech(and who knows, it may one day be so), but the spec is so
abstract, it makes it hard for anyone to actually implement a conforming
renderer. It is something that could be kept in mind though.
> 
>      3) XForms - this is the option that seems to show the most promise, 
> although the standard is still developing.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~