You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Daniel Griscom <gr...@suitable.com> on 2006/02/03 15:24:06 UTC

Binary diffs: real-world differencing?

I'm looking into Subversion, and since I do a lot of multimedia work 
I'm especially interested in its ability to do binary diffs. But I'm 
wondering: has anyone documented how well this reduces 
traffic/storage with different types of real-world files? For 
instance:

- Change a small region of a JPEG file, and a much larger region may 
actually change

- Change a few lines of code, and a compiled executable's internal 
pointers and code locations may completely change

- Change a few pixels of a GIF file with LZW compression may mean 
that different pixel strings are represented by different keys, 
changing the whole file

- Changing a small portion of an MSWord document may (for all I know) 
change the whole file

- Adding or removing a single file from a ZIP archive may (again) 
change the compression keys, thus substantially changing the data.

This leaves me wondering whether every time I change a binary file 
SVN will spend a long time carefully comparing the old and new 
versions, only to throw up its hands and copy the entire new file 
into the repository.


So, has anyone documented how well different types of binary files 
are differenced? If not, is there a way I can easily test it myself, 
perhaps with a command-line executable that takes two files and 
outputs a "binary difference" file?


Thanks,
Dan

-- 
Daniel T. Griscom             griscom@suitable.com
Suitable Systems              http://www.suitable.com/
1 Centre Street, Suite 204    (781) 665-0053
Wakefield, MA  01880-2400

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Binary diffs: real-world differencing?

Posted by Daniel Griscom <gr...@suitable.com>.
At 11:27 AM -0500 2/3/06, Kevin Greiner wrote:
>On 2/3/06, Daniel Griscom <gr...@suitable.com> wrote:
>>  I'm looking into Subversion, and since I do a lot of multimedia work
>>  I'm especially interested in its ability to do binary diffs. But I'm
>>  wondering: has anyone documented how well this reduces
>>  traffic/storage with different types of real-world files?
>
>Since 1.2.0, svn has defaulted to using xdelta binary diffs. See
>http://svn.haxx.se/users/archive-2005-05/1370.shtml. You can download
>a win32 cmd-line binary here
>http://evanjones.ca/software/xdelta-win32.html to see for yourself the
>size of diffs after the kind of changes you typically make.

Sounds close, but I'm an OS X/BSD kinda guy. I tried compiling the 
xdelta1.1.3 sources, but got "unknown system" errors. I did find an 
OS X command-line version, but it's using xdelta version 3. What 
version does svn use?


Dan

-- 
Daniel T. Griscom             griscom@suitable.com
Suitable Systems              http://www.suitable.com/
1 Centre Street, Suite 204    (781) 665-0053
Wakefield, MA  01880-2400

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Binary diffs: real-world differencing?

Posted by Kevin Greiner <gr...@gmail.com>.
On 2/3/06, Daniel Griscom <gr...@suitable.com> wrote:
> I'm looking into Subversion, and since I do a lot of multimedia work
> I'm especially interested in its ability to do binary diffs. But I'm
> wondering: has anyone documented how well this reduces
> traffic/storage with different types of real-world files?

Since 1.2.0, svn has defaulted to using xdelta binary diffs. See
http://svn.haxx.se/users/archive-2005-05/1370.shtml. You can download
a win32 cmd-line binary here
http://evanjones.ca/software/xdelta-win32.html to see for yourself the
size of diffs after the kind of changes you typically make.

Mark's suggestion is good too. The FSFS repo stores forward diffs and
older revisions are never modified so the size of a given rev is a
good indicator of the binary diff size + some overheard for directory
contents.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Binary diffs: real-world differencing?

Posted by Mark Phippard <ma...@softlanding.com>.
Daniel Griscom <gr...@suitable.com> wrote on 02/03/2006 10:24:06 AM:

> I'm looking into Subversion, and since I do a lot of multimedia work 
> I'm especially interested in its ability to do binary diffs. But I'm 
> wondering: has anyone documented how well this reduces 
> traffic/storage with different types of real-world files? For 
> instance:
> 
> - Change a small region of a JPEG file, and a much larger region may 
> actually change
> 
> - Change a few lines of code, and a compiled executable's internal 
> pointers and code locations may completely change
> 
> - Change a few pixels of a GIF file with LZW compression may mean 
> that different pixel strings are represented by different keys, 
> changing the whole file
> 
> - Changing a small portion of an MSWord document may (for all I know) 
> change the whole file
> 
> - Adding or removing a single file from a ZIP archive may (again) 
> change the compression keys, thus substantially changing the data.
> 
> This leaves me wondering whether every time I change a binary file 
> SVN will spend a long time carefully comparing the old and new 
> versions, only to throw up its hands and copy the entire new file 
> into the repository.
> 
> 
> So, has anyone documented how well different types of binary files 
> are differenced? If not, is there a way I can easily test it myself, 
> perhaps with a command-line executable that takes two files and 
> outputs a "binary difference" file?

Create an FSFS repository.  Import all of these file types into it.  Then 
check them out and modify and commit them one at a time.  Then look at the 
size of the revision file created in the repository. That would give you a 
good idea.  You could also trace the network connection to see how much 
data it sends.  If you do the testing using the TortoiseSVN client to an 
http:// repository it will report how much data is transferred.

Mark


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org