You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by listman <li...@burble.net> on 2006/08/25 18:02:12 UTC

subversion performance issues (was perforce Vs subversion)

Hi, I wanted to summarize some discussions I've been having with some  
of the SVN developers offline as well as the discussion we've started  
to have here.

The issue:
Subversion management of large binary files can be very slow

The Subversion assumption:
Subversion assumes that the network bandwidth requirements of  
handling large binary files is such that its more efficient to diff  
the current and current-1
versions and transmit the delta, the argument being that if you're on  
a slow modem connection or a flakey US-India cable you'd prefer to  
deal with the
diff compute times than the time taken to transmit the large files.

Why this doesn't make sense in many situations:
1. Unfortunately a lot of binary db's don't diff very well with  
incremental changes to the user input. The diffs can often be as  
large as the original file
2. Often all the users are on a local network or the repositories are  
mirrored between sites and the available network bandwidth is very good.
3. For large files the diffs can take an extremely long time, much  
much more time than the time taken to transmit the entire file, even  
under high
network load situations.

Another complication:
After talking to various developer types it seems that Subversion is  
actually doing a binary diff at the client and the server end, which  
is redundant.
so we have 2x the number of (time consuming) diffs we need.

The fix:
a) we need to remove the redundant diff operations that currently occur

b) one of the developers needs to profile Subversion and determine  
the bottle necks under the following scenarios
	i) doing an initial import of a large binary file to a fresh repository
	ii) committing a new version of a large binary file to an existing  
repos

this will likely throw up a list of other possible improvements.

c) On the user group list, Talden suggested that a new prop gets  
added to subversion that allows users to designate files that  
shouldn't be diffed
"svn:diffasnew" was his suggested keyword, and instruct both client  
and server to treat the file as a complete replacement. This seems  
like a
good suggestion.

I'm willing to pay a bounty for developers that are interested in  
working on this. Please contact me for more details.

Subversion is my preferred tool for my software activities, I'd love  
to be able to use for all my design data, but unfortunately its just  
not an option
at the moment..



>

Re: subversion performance issues (was perforce Vs subversion)

Posted by Daniel Berlin <db...@dberlin.org>.
listman wrote:
> 
> Hi, I wanted to summarize some discussions I've been having with some of
> the SVN developers offline as well as the discussion we've started to
> have here. 
> 
> *The issue:* 
> Subversion management of large binary files can be very slow
> 
> *The Subversion assumption: *
> Subversion assumes that the network bandwidth requirements of handling
> large binary files is such that its more efficient to diff the current
> and current-1
> versions and transmit the delta, the argument being that if you're on a
> slow modem connection or a flakey US-India cable you'd prefer to deal
> with the
> diff compute times than the time taken to transmit the large files.
> 
> *Why this doesn't make sense in many situations:*
> 1. Unfortunately a lot of binary db's don't diff very well with
> incremental changes to the user input. The diffs can often be as large
> as the original file
> 2. Often all the users are on a local network or the repositories are
> mirrored between sites and the available network bandwidth is very good.
> 3. For large files the diffs can take an extremely long time, much much
> more time than the time taken to transmit the entire file, even under high
> network load situations.
> 
> *Another complication:*
> After talking to various developer types it seems that Subversion is
> actually doing a binary diff at the client and the server end, which is
> redundant.
> so we have 2x the number of (time consuming) diffs we need.
> 
> *The fix**:*
> a) we need to remove the redundant diff operations that currently occur
> 
> b) one of the developers needs to profile Subversion and determine the
> bottle necks under the following scenarios
> i) doing an initial import of a large binary file to a fresh repository
> ii) committing a new version of a large binary file to an existing repos
> 
> this will likely throw up a list of other possible improvements.
> 
> c) On the user group list, Talden suggested that a new prop gets added
> to subversion that allows users to designate files that shouldn't be diffed
> "svn:diffasnew" was his suggested keyword, and instruct both client and
> server to treat the file as a complete replacement. This seems like a 
> good suggestion.
> 
> I'm willing to pay a bounty for developers that are interested in
> working on this. Please contact me for more details.
> 
> Subversion is my preferred tool for my software activities, I'd love to
> be able to use for all my design data, but unfortunately its just not an
> option
> at the moment..
> 
> 
> 
>>

If you give me repos and files i can use to simulate the problems, i
will make it take less time :)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org