You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Michael Bartholomäus <ba...@gmx.de> on 2004/05/21 21:51:44 UTC

subversion's storing / diffing techniques

Hi,

I am studying computer science at Bonn / germany and writing my diploma 
thesis about

"version control in an xml-based content management system".

I know about RCS/CVS that it stores the current version as is and that 
it uses backward deltas
for older versions to reduce storage amount (after diffing the text 
files). In the subversion
book I read that even in case of binary data subversion is capable of 
not having to store the complete file.

Now my questions:
Is there any documentation about the way subversion computes what has 
changed in a file?
When requesting an old version of a file - how is it reconstructed?
How does subversion reduce the amount of data it has to store?
What is the actual storing technique used by subversion?
What information is stored in the underlying Berkley DB?

(Of course I am especially interested in the question what happens when 
committing
xml data.)

Thanks in advance for answering my questions.

Yours sincerely

Michael Bartholomäus

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: subversion's storing / diffing techniques

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2004-05-21 at 17:51, Michael Bartholomäus wrote:
> Is there any documentation about the way subversion computes what has 
> changed in a file?
> When requesting an old version of a file - how is it reconstructed?
> How does subversion reduce the amount of data it has to store?
> What is the actual storing technique used by subversion?
> What information is stored in the underlying Berkley DB?

We use an algorithm called vdelta (created by Phong Vo at AT&T, I
believe) to compute the differences between one file and another.  We
apply the vdelta algorithm to successive corresponding "windows" of the
source and target file.  The output of the algorithm is a sequence of
instructions which say how to reconstruct the target window by copying
ranges of bytes from the source window or from the already-constructed
part of the target window, or by inserting bits of "new data."  The
instructions are then encoded in a format called svndiff.

That only answers part of the question, of course.  Given that we have a
binary delta algorithm (which can also be used as a compression
algorithm, by taking the delta between the empty stream and the target),
how do we store the contents of a file across various changes?  We use a
technique called "skip-deltas" so that it only takes O(lg(N)) deltas to
reconstruct any version of a file which has gone through N changes.  I
happen to have just written a long piece of mail about that to the users
list, which is presumably stuck in moderation; check the archives
periodically for that.  (Subject: "Re: Revision Differences").

Perhaps the easiest-to-retrieve documentation of the vdelta algorithm
lives in a long comment in
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_delta/vdelta.c,
around line 125.

The svndiff format we use to encode vdelta instructions is documented in
http://svn.collab.net/repos/svn/trunk/notes/svndiff.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: subversion's storing / diffing techniques

Posted by Greg Hudson <gh...@MIT.EDU>.

On Fri, 2004-05-21 at 17:51, Michael Bartholomäus wrote:
> Is there any documentation about the way subversion computes what has 
> changed in a file?
> When requesting an old version of a file - how is it reconstructed?
> How does subversion reduce the amount of data it has to store?
> What is the actual storing technique used by subversion?
> What information is stored in the underlying Berkley DB?

We use an algorithm called vdelta (created by Phong Vo at AT&T, I
believe) to compute the differences between one file and another.  We
apply the vdelta algorithm to successive corresponding "windows" of the
source and target file.  The output of the algorithm is a sequence of
instructions which say how to reconstruct the target window by copying
ranges of bytes from the source window or from the already-constructed
part of the target window, or by inserting bits of "new data."  The
instructions are then encoded in a format called svndiff.

That only answers part of the question, of course.  Given that we have a
binary delta algorithm (which can also be used as a compression
algorithm, by taking the delta between the empty stream and the target),
how do we store the contents of a file across various changes?  We use a
technique called "skip-deltas" so that it only takes O(lg(N)) deltas to
reconstruct any version of a file which has gone through N changes.  I
happen to have just written a long piece of mail about that to the users
list, which is presumably stuck in moderation; check the archives
periodically for that.  (Subject: "Re: Revision Differences").

Perhaps the easiest-to-retrieve documentation of the vdelta algorithm
lives in a long comment in
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_delta/vdelta.c,
around line 125.

The svndiff format we use to encode vdelta instructions is documented in
http://svn.collab.net/repos/svn/trunk/notes/svndiff.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org