You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Ph. Marek" <ph...@bmlv.gv.at> on 2004/02/10 07:44:13 UTC

Idea for libsvn_wc after 1.0 ...

Hello everybody!

[DISCLAIMER: I read the mailing lists, and I looked into the issues. But of 
course something could have escaped me]


As some event I couldn't yet isolate trashed some of my files I once again 
rethought my backup-strategy. I came to the conclusion that instead of 
copying my /home to another disk from time to time I can do better: I should 
versionate my /home, so that I can do a commit on every shutdown and "nothing 
ever will be lost" (TM).

Then I thought about my favourite version control system (svn :-) and found 
that there would be some improvements to be done.

(Please read further, the interesting part comes later)


- First of all, the .svn-directories are a bit distracting. Furthermore, in 
one of my projects my wc has about 100 000 inodes, of which are 78800 in .svn 
directories (ok, I've got properties on nearly every file). So that's a bit 
inefficient. Of course, there are already some issues.
	rethink .svn/ area read-only files and dirs strategy
		http://subversion.tigris.org/issues/show_bug.cgi?id=1294
	Need support for opaque collections/"document bundles" in wo
		http://subversion.tigris.org/issues/show_bug.cgi?id=707

- Next, the text-base files are unnecessary in some cases. If svn is used as a 
backup media, a diff is seldom required. So the space could be saved.
	Store text-base compressed
		http://subversion.tigris.org/issues/show_bug.cgi?id=908

- Furthermore, the performance is sometimes not as it could be.
	performance bad in "svn mv" with whole directories
		ttp://subversion.tigris.org/issues/show_bug.cgi?id=1284


So what's the problem, what's the answer?

Of course I'll choose the easy target - libsvn_wc.
Part of the problem is that there's a directory tree to remembered, to know 
what the base of the wc is. This is currently done in the .svn-directories, 
in files like entries, dir-props, and some directories.

But wait - we already do that! Yes, in some genially piece of software already 
in svn! libsvn_fs is it called.

So - answer: Why don't we use the already perfectly working part for the 
repository on the wc side too?
It stores the tree-structure, has multiple versions (base of wc, current wc 
state), can store the properties inside, and even the textbase!

So I propose to *allow* for something like this (discussion below)
- On checkout there's a parameter "--with-metadata-to DIR" which does a 
similar structure to the repository in the given directory, to save the 
wc-meta-information, and makes an entry in a ~/.subversion-file which says
"everything below this checked-out directory belongs to the meta-data at DIR"
That gives us 707, 1294, and (possibly) 1284, as there is no .svn/entries-file 
to be opened several hundred times for a bigger "svn mv".

- Another parameter for checkout is "--save-text-base-for N", where N can be a 
number of days or similar.
On diff all files diffed are cached in db/strings in DIR; on commit the 
changed files are stored there, too (so the deltification data can be done 
locally), and on every use of a cached file a timestamp for this file is set 
to the current date.
In some crontab or similar the user can do an entry like "svnadmin purgewc 
DIR", which deletes "old", unused entries out of db/strings.
That gives us 908.

- svn invoked reads ~/.subversion/working-copies and looks for the longest 
matching path, and looks into the current directory for a subdirectory .svn.
If only one is present, no problem.
If both are, then **MAKE A POLICY DECISION HERE**. (hehe :-)

- locks are done by setting a lock bit for the current wc dir in the stored 
directory tree, and for each leaf directory "downwards", like the current 
locks. 
<speculative>I'm not sure about bdb, but updates of single bits are possible 
faster than creation of files. Possibly there should be a single bitmap for 
these lock-entries, mmaped or so for maximum performance. </>

- (A bit off-topic:) To achieve a (nearly) un-supervised execution of backups 
(as in 707) the current perl-script svn_load_dirs.pl could be extended with 
manber hashing, as already discussed here (see 
http://marc.theaimsgroup.com/?l=subversion-dev&m=106810408730390&w=2)
And the various checksums could be stored in the "local" repository.


The more I though about using a berkeley (or whatever) database for the wc 
(*with the same or a very similar codebase as libsvn_fs*) the more 
similarities I found, which would make that easy. (ar least to the naive mind 
of a non-developer)


Pro & Con:
+ uses already done code from libsvn_fs, with minor modifications.
+ possibly allows for restructuring of libsvn_wc, which is a fragile piece of 
work (at least that's my impression reading the mailing list)
+ That solves some issues, at least the four above.
+ It may be a performance improvement.
+ It should save some space. On a 4k filesystem the 
~ the current way could be left, as it's much easier for moving wc's around.
(although I don't really buy the point - who can move wc's can update a 
pointer in a ~/subversion/-file.
- changes to current architecture are needed.


Let the discussion arise :-)


Regards,

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Idea for libsvn_wc after 1.0 ...

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-02-10 at 06:17, Ph. Marek wrote:
> > con: the WC loses platform independence (i.e., you can't share it and
> > expect it to work under linux, mac, and windows).
> I'd expect to have a processor-dependence, ie. big vs. little-endian.
> Is the db-format different between linux/windows/MacOS/etc.???

BDB application data is stored in a platform-independent file.  Since
Subversion's application data is itself platform-independent, the actual
files like "nodes" are platform-independent.

However, to access BDB data, you must mmap() a file as a region of
shared memory and treat some of the contents as pthread_mutex_t
structures.  (On Windows, something equivalent happens.)  Since no
shared filesystem can implement mmap() in a sane manner, you cannot use
BDB on a shared filesystem, no matter how well it implements POSIX
locking semantics.  Moreover, you cannot copy a database from one
platform to another and have it work, unless the platforms have
identical memory representations of pthread_mutex_t and some other
structures.

Your proposal would have more applicability if we had a filesystem
back-end without so many restrictions.  I hope to create such a back end
some day, perhaps next winter.  (My real job tends to get busy in the
spring and summer.  Of course, other people are also welcome to have a
go at this task, and I can contribute ideas, but it's a pretty big
undertaking.)

At any rate, as clkao points out, you should look into svk.  I think
it's very close to what you propose.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Idea for libsvn_wc after 1.0 ...

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
> > Pro & Con:
> > + uses already done code from libsvn_fs, with minor modifications.
> > + possibly allows for restructuring of libsvn_wc, which is a fragile
> > piece of work (at least that's my impression reading the mailing list) +
> > That solves some issues, at least the four above.
> > + It may be a performance improvement.
> > + It should save some space. On a 4k filesystem the
> > ~ the current way could be left, as it's much easier for moving wc's
> > around. (although I don't really buy the point - who can move wc's can
> > update a pointer in a ~/subversion/-file.
> > - changes to current architecture are needed.
>
> con: the WC loses platform independence (i.e., you can't share it and
> expect it to work under linux, mac, and windows).
I'd expect to have a processor-dependence, ie. big vs. little-endian.
Is the db-format different between linux/windows/MacOS/etc.???


Regards,

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Idea for libsvn_wc after 1.0 ...

Posted by John Szakmeister <jo...@szakmeister.net>.
On Tuesday 10 February 2004 02:44, Ph. Marek wrote:
> [snip]
> Pro & Con:
> + uses already done code from libsvn_fs, with minor modifications.
> + possibly allows for restructuring of libsvn_wc, which is a fragile piece
> of work (at least that's my impression reading the mailing list)
> + That solves some issues, at least the four above.
> + It may be a performance improvement.
> + It should save some space. On a 4k filesystem the
> ~ the current way could be left, as it's much easier for moving wc's
> around. (although I don't really buy the point - who can move wc's can
> update a pointer in a ~/subversion/-file.
> - changes to current architecture are needed.

con: the WC loses platform independence (i.e., you can't share it and expect 
it to work under linux, mac, and windows).

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Idea for libsvn_wc after 1.0 ...

Posted by Chia-Liang Kao <cl...@clkao.org>.
I'm a libsvn_wc and .svn hater. What you wanted could actually be done
with the current feature of "svk import" if you are backing up to
a local repository. It does what svn_load_dir.pl does, only much faster.
(every imports is as fast as the first "svn import")
but it doesn't ask you about if files are being renamed.

Cheers,
CLK

On Tue, Feb 10, 2004 at 08:44:13AM +0100, Ph. Marek wrote:
> As some event I couldn't yet isolate trashed some of my files I once again 
> rethought my backup-strategy. I came to the conclusion that instead of 
> copying my /home to another disk from time to time I can do better: I should 
> versionate my /home, so that I can do a commit on every shutdown and "nothing 
> ever will be lost" (TM).
> 
> Then I thought about my favourite version control system (svn :-) and found 
> that there would be some improvements to be done.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org