You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Daniel Griscom <gr...@suitable.com> on 2006/02/03 15:24:06 UTC
Binary diffs: real-world differencing?
I'm looking into Subversion, and since I do a lot of multimedia work
I'm especially interested in its ability to do binary diffs. But I'm
wondering: has anyone documented how well this reduces
traffic/storage with different types of real-world files? For
instance:
- Change a small region of a JPEG file, and a much larger region may
actually change
- Change a few lines of code, and a compiled executable's internal
pointers and code locations may completely change
- Change a few pixels of a GIF file with LZW compression may mean
that different pixel strings are represented by different keys,
changing the whole file
- Changing a small portion of an MSWord document may (for all I know)
change the whole file
- Adding or removing a single file from a ZIP archive may (again)
change the compression keys, thus substantially changing the data.
This leaves me wondering whether every time I change a binary file
SVN will spend a long time carefully comparing the old and new
versions, only to throw up its hands and copy the entire new file
into the repository.
So, has anyone documented how well different types of binary files
are differenced? If not, is there a way I can easily test it myself,
perhaps with a command-line executable that takes two files and
outputs a "binary difference" file?
Thanks,
Dan
--
Daniel T. Griscom griscom@suitable.com
Suitable Systems http://www.suitable.com/
1 Centre Street, Suite 204 (781) 665-0053
Wakefield, MA 01880-2400
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Binary diffs: real-world differencing?
Posted by Daniel Griscom <gr...@suitable.com>.
At 11:27 AM -0500 2/3/06, Kevin Greiner wrote:
>On 2/3/06, Daniel Griscom <gr...@suitable.com> wrote:
>> I'm looking into Subversion, and since I do a lot of multimedia work
>> I'm especially interested in its ability to do binary diffs. But I'm
>> wondering: has anyone documented how well this reduces
>> traffic/storage with different types of real-world files?
>
>Since 1.2.0, svn has defaulted to using xdelta binary diffs. See
>http://svn.haxx.se/users/archive-2005-05/1370.shtml. You can download
>a win32 cmd-line binary here
>http://evanjones.ca/software/xdelta-win32.html to see for yourself the
>size of diffs after the kind of changes you typically make.
Sounds close, but I'm an OS X/BSD kinda guy. I tried compiling the
xdelta1.1.3 sources, but got "unknown system" errors. I did find an
OS X command-line version, but it's using xdelta version 3. What
version does svn use?
Dan
--
Daniel T. Griscom griscom@suitable.com
Suitable Systems http://www.suitable.com/
1 Centre Street, Suite 204 (781) 665-0053
Wakefield, MA 01880-2400
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Binary diffs: real-world differencing?
Posted by Kevin Greiner <gr...@gmail.com>.
On 2/3/06, Daniel Griscom <gr...@suitable.com> wrote:
> I'm looking into Subversion, and since I do a lot of multimedia work
> I'm especially interested in its ability to do binary diffs. But I'm
> wondering: has anyone documented how well this reduces
> traffic/storage with different types of real-world files?
Since 1.2.0, svn has defaulted to using xdelta binary diffs. See
http://svn.haxx.se/users/archive-2005-05/1370.shtml. You can download
a win32 cmd-line binary here
http://evanjones.ca/software/xdelta-win32.html to see for yourself the
size of diffs after the kind of changes you typically make.
Mark's suggestion is good too. The FSFS repo stores forward diffs and
older revisions are never modified so the size of a given rev is a
good indicator of the binary diff size + some overheard for directory
contents.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Re: Binary diffs: real-world differencing?
Posted by Mark Phippard <ma...@softlanding.com>.
Daniel Griscom <gr...@suitable.com> wrote on 02/03/2006 10:24:06 AM:
> I'm looking into Subversion, and since I do a lot of multimedia work
> I'm especially interested in its ability to do binary diffs. But I'm
> wondering: has anyone documented how well this reduces
> traffic/storage with different types of real-world files? For
> instance:
>
> - Change a small region of a JPEG file, and a much larger region may
> actually change
>
> - Change a few lines of code, and a compiled executable's internal
> pointers and code locations may completely change
>
> - Change a few pixels of a GIF file with LZW compression may mean
> that different pixel strings are represented by different keys,
> changing the whole file
>
> - Changing a small portion of an MSWord document may (for all I know)
> change the whole file
>
> - Adding or removing a single file from a ZIP archive may (again)
> change the compression keys, thus substantially changing the data.
>
> This leaves me wondering whether every time I change a binary file
> SVN will spend a long time carefully comparing the old and new
> versions, only to throw up its hands and copy the entire new file
> into the repository.
>
>
> So, has anyone documented how well different types of binary files
> are differenced? If not, is there a way I can easily test it myself,
> perhaps with a command-line executable that takes two files and
> outputs a "binary difference" file?
Create an FSFS repository. Import all of these file types into it. Then
check them out and modify and commit them one at a time. Then look at the
size of the revision file created in the repository. That would give you a
good idea. You could also trace the network connection to see how much
data it sends. If you do the testing using the TortoiseSVN client to an
http:// repository it will report how much data is transferred.
Mark
_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org