You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Cyrus Jones <cy...@claripure.net> on 2005/10/18 16:00:42 UTC

Download Deltas Part II

Thanks for the response on my earlier question. The reason I asked is 
that I ran a test with a large compressed fil. I checked this file in on 
my development machine and the I then checked out the file on a second 
machine. I then modified the file and made several random changes 
throughout the file and checked that in. When I updated the file on the 
second machine it appears that SVN simply pulled the entire file down 
instead of just the changes. Is there some kind of threshold where if a 
file is too "noisy" that SVN simply resorts to pulling down a new copy?

Thanks
Cy.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Download Deltas Part II

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Oct 18, 2005, at 18:00, Cyrus Jones wrote:

> Thanks for the response on my earlier question. The reason I asked  
> is that I ran a test with a large compressed fil. I checked this  
> file in on my development machine and the I then checked out the  
> file on a second machine. I then modified the file and made several  
> random changes throughout the file and checked that in. When I  
> updated the file on the second machine it appears that SVN simply  
> pulled the entire file down instead of just the changes. Is there  
> some kind of threshold where if a file is too "noisy" that SVN  
> simply resorts to pulling down a new copy?

Not that I'm aware of.

By what means did you determine that Subversion was doing this?

Depending on the compression algorithm you used, the modified  
compressed file may bear little resemblance to the original  
compressed file. This applies to some other binary formats too.  
Subversion will still work fine, it's just that the space savings the  
differencing is supposed to achieve aren't necessarily realized  
there. It works best for uncompressed text files.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Download Deltas Part II

Posted by Daniel Berlin <db...@dberlin.org>.
On Tue, 2005-10-18 at 12:00 -0400, Cyrus Jones wrote:
> Thanks for the response on my earlier question. The reason I asked is 
> that I ran a test with a large compressed fil. I checked this file in on 
> my development machine and the I then checked out the file on a second 
> machine. I then modified the file and made several random changes 
> throughout the file and checked that in. When I updated the file on the 
> second machine it appears that SVN simply pulled the entire file down 
> instead of just the changes. Is there some kind of threshold where if a 
> file is too "noisy" that SVN simply resorts to pulling down a new copy?


Nope. But depending on the compression algorithm, stream alignment and
other issues may prevent the delta from being much smaller than the
entire file.  zlib had this issue.  There is actually a patch flotaing
around (and is in the rsync contrib dir) that makes zlib rsync friendly,
which actually makes it delta friendly in general.

If you want a technical answer, if you just remove a chunk 33 bytes long
and it then just shifted everything else 33 bytes to the left to fill
the gap, everything will be misaligned such that the delta algorithm
won't pick up that the bytes are the same again until you hit the x % 33
= 0'th byte, where x is a multiple of the points its checksumming to
look for matches.

If this happens quite often, it's possible the delta algorithm never
finds matches.

A good estimate of how well the subversion algorithm will do is to use
the xdelta command line application to produce a delta between the
files.

There are also various stream alignment techniques to try to discover
better starting points than just stabbing checksums every x bytes, but
we don't use them.  (mainly due to nobody implementing them).


HTH,
Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org