You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Troy Curtis Jr <tr...@gmail.com> on 2006/10/04 03:46:47 UTC

Delta Question

So I understand that Subversion stores files inside the repository as
deltas against previous versions (fsfs) or future versions (bdb).  I
also understand that using the '--deltas' option with 'svnadmin dump'
greatly reduces the size of the dump because of the same
deltafication.

My question to you is can you think of a case (actually a reason since
I have a case) where the deltafied dump file is ~25% the size of the
actual repository?  Here are my stats (approx. as I can't remember the
exact values...this is a repo at work)

RCS: ~1GB
Subversion: ~2GB (only a couple hundred megs different between fsfs and bdb)
Dump file: <500MB

This seems like a very odd thing to happen.  Of course I am using the
cvs2svn script, so can anyone think of an option/configuration I can
use to keep my repository from ballooning?  I have a lot of
'unnamed-*' branches which I originally thought might be the issue
(like maybe they weren't true cheap branches or something), but I only
save 200-300 MB if I only convert the trunk.

This RCS repository has been around circa 1993 and has moved between
no fewer than three different OSes (DEC Alpha -> Solaris 8 -> Red Hat
8.0).  I did run into some formatting issues, but only on a couple of
files, and I was able to fix them.

It seems to me the deltafication for the repository and the dump file
should be using the same basic algorithm, but perhaps I am way off
base.

Any ideas?

Thanks,
Troy

-- 
"Beware of spyware. If you can, use the Firefox browser." - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Delta Question

Posted by Greg Hudson <gh...@MIT.EDU>.
On Oct 3, 2006, at 11:46 PM, Troy Curtis Jr wrote:
> My question to you is can you think of a case (actually a reason since
> I have a case) where the deltafied dump file is ~25% the size of the
> actual repository?  Here are my stats (approx. as I can't remember the
> exact values...this is a repo at work)
>
> RCS: ~1GB
> Subversion: ~2GB (only a couple hundred megs different between fsfs  
> and bdb)
> Dump file: <500MB

The problem here is most actually with directory data, in all  
likelihood.  Subversion's storage of file data is reasonably  
efficient, but its storage of directory data is quite wasteful,  
particularly when you have lots of single-file commits within big  
directories.  Repositories resulting from cvs2svn often have a lot of  
that depending on how cvs was used.  Dump files don't have this  
problem since they don't try to support efficient random access or  
traversal of directory data.

When I was more involved with Subversion, I thought about some ways  
to store directory data more efficiently.  My best idea was a btree  
with multiple roots, each root representing a revision of the  
directory.  (This would require storing all of the directory  
revisions together, so wouldn't work for FSFS, but could be applied  
to BDB or some future back end.)  I never fully fleshed out the idea  
or even documented it properly, though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org