You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Troy Curtis Jr <tr...@gmail.com> on 2006/10/04 03:46:47 UTC
Delta Question
So I understand that Subversion stores files inside the repository as
deltas against previous versions (fsfs) or future versions (bdb). I
also understand that using the '--deltas' option with 'svnadmin dump'
greatly reduces the size of the dump because of the same
deltafication.
My question to you is can you think of a case (actually a reason since
I have a case) where the deltafied dump file is ~25% the size of the
actual repository? Here are my stats (approx. as I can't remember the
exact values...this is a repo at work)
RCS: ~1GB
Subversion: ~2GB (only a couple hundred megs different between fsfs and bdb)
Dump file: <500MB
This seems like a very odd thing to happen. Of course I am using the
cvs2svn script, so can anyone think of an option/configuration I can
use to keep my repository from ballooning? I have a lot of
'unnamed-*' branches which I originally thought might be the issue
(like maybe they weren't true cheap branches or something), but I only
save 200-300 MB if I only convert the trunk.
This RCS repository has been around circa 1993 and has moved between
no fewer than three different OSes (DEC Alpha -> Solaris 8 -> Red Hat
8.0). I did run into some formatting issues, but only on a couple of
files, and I was able to fix them.
It seems to me the deltafication for the repository and the dump file
should be using the same basic algorithm, but perhaps I am way off
base.
Any ideas?
Thanks,
Troy
--
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Re: Delta Question
Posted by Greg Hudson <gh...@MIT.EDU>.
On Oct 3, 2006, at 11:46 PM, Troy Curtis Jr wrote:
> My question to you is can you think of a case (actually a reason since
> I have a case) where the deltafied dump file is ~25% the size of the
> actual repository? Here are my stats (approx. as I can't remember the
> exact values...this is a repo at work)
>
> RCS: ~1GB
> Subversion: ~2GB (only a couple hundred megs different between fsfs
> and bdb)
> Dump file: <500MB
The problem here is most actually with directory data, in all
likelihood. Subversion's storage of file data is reasonably
efficient, but its storage of directory data is quite wasteful,
particularly when you have lots of single-file commits within big
directories. Repositories resulting from cvs2svn often have a lot of
that depending on how cvs was used. Dump files don't have this
problem since they don't try to support efficient random access or
traversal of directory data.
When I was more involved with Subversion, I thought about some ways
to store directory data more efficiently. My best idea was a btree
with multiple roots, each root representing a revision of the
directory. (This would require storing all of the directory
revisions together, so wouldn't work for FSFS, but could be applied
to BDB or some future back end.) I never fully fleshed out the idea
or even documented it properly, though.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org