You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Chris Frost <ch...@frostnet.net> on 2007/05/24 02:40:59 UTC

A text-base penalty solution (without a working copy rewrite)

Preface: this email summarizes my understanding of the text-base penalty,
introduces an (implemented, http://scord.sf.net) approach that solves the
problem now and without rewriting the working copy API, and asks for any
feedback people may have.

There has been at least a fair amount of discussion regarding the
subversion working copy's text-base penalty on this and the user lists
since at least October 2001. For what it is worth, I have tried to
compile a list of related discussions: <http://scord.sf.net/#why>.

At the high level, text-base saves network bandwidth and supports the
offline operation of several commands, while incurring an additional
100% (of working files) disk space. Subversion is certainly upfront
about this tradeoff, explicitly stating subversion's plentiful disk
space assumption, but the tradeoff does unfortunately limit
subversion's use cases. Specifically, large source code repos (e.g. when
SCO wanted to move all their source code from SCCS to subversion and
checkout all code on each developer's computer) and media repos.
Disks may grow in size, but people keep up with more and/or larger files.
(Additionally, disk bandwidth tends to increase more slowly than disk
sizes increase.) This tradeoff is certainly well understood here; I
believe that it remains because any solution appears to require a
working copy API rewrite (breaking backwards compatibility) and (to a
lesser extent?) because no single solution has shown through as the
obvious end solution (don't store any pristine files? compress them?
follow SVK and store the entire repo? and others).

I do feel that subversion has had fair discussions on this topic and
that the long-term plan will probably lead to a solid client metadata
storage plan. However, given that this issue has persisted since at
least 2001, I have a suggestion for (at least) an interim solution:

Do not modify subversion. Instead, enhance the underlying filesystem
to detect and exploit file redundancies in working copy-style layouts.

I have implemented this approach, calling it "scord" (Subversion Check Out,
Reduced Disk), and believe it is now ready for use: <http://scord.sf.net>.
I am announcing scord here, and as v0.9.0, to hopefully hear back
feedback from any interested subversion developers? I would very much
appreciate any thoughts on scord's design or its implementation! scord
is currently implemented as a userspace overlay filesystem (using FUSE)
and runs on Linux. scord is designed with the same file manipulation
techniques as libsvn_wc to avoid working copy corruption in the face of
scord or system crashes (details in HACKING).


The biggest drawback I see with a filesystem-based approach is its
cross-platform limitations. scord currently supports only Linux. With a
little work scord should be portable to Mac OS X or FreeBSD. But
subversion runs on scores of platforms that scord does not. An NFS
server or preloaded library-based solution might support more (unix)
platforms, but afaik neither is a good match with Windows. Nonetheless,
Linux (plus hopefully OS X and FreeBSD) makes up a sizable portion of
subversion's installbase.

Re: A text-base penalty solution (without a working copy rewrite)

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
On Donnerstag, 24. Mai 2007, Chris Frost wrote:
> On Thu, May 24, 2007 at 06:55:55AM +0200, Ph. Marek wrote:
> > So maybe an easier solution *could* be to switch to fsvs - see
> > http://fsvs.tigris.org.
> Aha; thanks for mentioning this, Phil! (I had actually glanced at the fsvs
> site earlier, but misinterpreted how it works.)
What could I improve on the description? Do you have any suggestions?

> To compare fsvs and scord: I understood that the working copy library
> would probably need a rewrite to reduce disk usage, so gave up that
> route and wrote scord to work behind subversion's back. You saw the
> need for the rewrite and took it head on, with more significant
> enhancements to boot :).
>
> fsvs's removing .svn, reducing disk usage overhead, and more is pretty
> cool.
I'd like to add: the main cause for writing fsvs was meta-data versioning and 
speed.
If you have the time, compare "fsvs status" on a (mostly identical) working 
copy with several *hundred* thousand files, on a cold cache, with 
a "find -type f" (or something else that tells find to do a lstat() on the 
entry) [but don't try svn for that] - fsvs should come out ahead, as it tries 
to reduce harddisk seeks as much as possible.
On a hot cache it's a tiny bit slower, as it has a fair bit to do behind the 
scenes.


> The scord alternatives list now also tries to contrast fsvs and scord;
> please let me know if you think the comparison could be improved!
No, I think it's sufficient. Thank you!


Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: A text-base penalty solution (without a working copy rewrite)

Posted by Chris Frost <ch...@frostnet.net>.
On Thu, May 24, 2007 at 06:55:55AM +0200, Ph. Marek wrote:
> But reading http://scord.sourceforge.net/ "scord compared with existing 
> solutions for subversion" I see that you're missing another alternative - 
> fsvs.
> 
> fsvs is used primarily for binary (or at least opaque) objects. That may be 
> files, databases, whole filesystems (on a file-level, not block-device!), or 
> even a whole machine-installation.
> 
> (Although not tested for source code versioning, there's no obvious 
> limitation - apart from (currently) not being able to do merges).
> 
> 
> So maybe an easier solution *could* be to switch to fsvs - see 
> http://fsvs.tigris.org.

Aha; thanks for mentioning this, Phil! (I had actually glanced at the fsvs
site earlier, but misinterpreted how it works.)

To compare fsvs and scord: I understood that the working copy library
would probably need a rewrite to reduce disk usage, so gave up that
route and wrote scord to work behind subversion's back. You saw the
need for the rewrite and took it head on, with more significant
enhancements to boot :).

fsvs's removing .svn, reducing disk usage overhead, and more is pretty cool.

Even with fsvs, I think scord fills a helpful space as a small tool that
allows one to use subversion (and tools built around it) as they are (and
develop), without imposing the current working copy format's space overhead.

It is kind of interesting that fsvs and scord seem to have found their ways
to opposite sides of which software to integrate more closely with to
obtain their benefits (fsvs with subversion and scord with the operating
system).

For my particular goal use I do deal with merges (the repository also includes
photo metadata files and the photo album software, which get edited on multiple
machines); but fsvs may allow me to version other projects.


> Sorry for being this late ... Maybe I should append this link to the issues 
> you mentioned.

Sounds like an excellent addition.

The scord alternatives list now also tries to contrast fsvs and scord;
please let me know if you think the comparison could be improved!

-- 
Chris Frost  |  <http://www.frostnet.net/chris/>
-------------+----------------------------------------
PGP: <http://www.frostnet.net/chris/about/pgp_key.txt>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: A text-base penalty solution (without a working copy rewrite)

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
Hello Chris!

On Donnerstag, 24. Mai 2007, Chris Frost wrote:
> ... My personal
> motivator for scord was to store my photo album in subversion (6GB in
> 22k image files, plus metadata files). Cached props significantly speedup
> common subversion commands. And storing only a single copy of each
> working file's property file saves 90MB.
That's a great piece of work you did here.

But reading http://scord.sourceforge.net/ "scord compared with existing 
solutions for subversion" I see that you're missing another alternative - 
fsvs.


fsvs is used primarily for binary (or at least opaque) objects. That may be 
files, databases, whole filesystems (on a file-level, not block-device!), or 
even a whole machine-installation.

(Although not tested for source code versioning, there's no obvious 
limitation - apart from (currently) not being able to do merges).


So maybe an easier solution *could* be to switch to fsvs - see 
http://fsvs.tigris.org.


Sorry for being this late ... Maybe I should append this link to the issues 
you mentioned.


Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org