You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Th...@bahn.de on 2004/10/07 09:35:38 UTC

Howto prevent svn from storing deltas for binary files

Hi there.

I would like to take incoming deliveries of a subcompany under version
control. Most of the files are binary (c/c++ libraries).

When doing a usuability test with svn I found that a library of about 9 MB
increased the repository size by about 24-28 MB.
My conclusion is that a library as a result of a compilation might change
completely leading to a storage effort for tracing the changes that is
higher than original file itself. Second the reponse time for switching
between the versions takes to much time (about 30 seconds for a total of 13
MB and a local repository accessed by file://)

Is it possible to prevent svn from tracing the changes and simply store a
clean copy instead of a diff ?

Cheers, Thorsten




---------

Diese E-Mail könnte  vertrauliche und/oder rechtlich geschützte
Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder
diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe dieser Mail sind nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

----------



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Howto prevent svn from storing deltas for binary files

Posted by Erik Huelsmann <eh...@gmail.com>.
Hi,

> I would like to take incoming deliveries of a subcompany under version
> control. Most of the files are binary (c/c++ libraries).

> When doing a usuability test with svn I found that a library of about 9 MB
> increased the repository size by about 24-28 MB.

> My conclusion is that a library as a result of a compilation might change
> completely leading to a storage effort for tracing the changes that is
> higher than original file itself. 

Did you verify which files caused this increase in size? I think it
may be caused by unused logs laying around in your repository db/
directory.

Also, if you import it the first time, maybe the increase is 24MB, but
how much increase do you see the second time you commit a library to
the repository? I don't think bdb gives back any unused space in its
db files to the filesystem, but reuses it on next occasions.

> Second the reponse time for switching
> between the versions takes to much time (about 30 seconds for a total of 13
> MB and a local repository accessed by file://)

This is possibly due to the fact that the ra layer is generic and
unaware that there is no network communication. So it calculates a
diff against the version you have in your tree. After 'sending' it,
the diff is applied. Which costs a lot of CPU. Unfortunately, there
currently is no way to tell Subversion you are working locally.

> Is it possible to prevent svn from tracing the changes and simply store a
> clean copy instead of a diff ?

No. I'm sorry, but I don't think seeing an increase of 24MB on
committing a file of 9 MB is caused by Subversion's diffing routines.

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Howto prevent svn from storing deltas for binary files

Posted by Scott Palmer <sc...@2connected.org>.
On Oct 7, 2004, at 1:21 PM, Benjamin Pflugmann wrote:

> Much more probably is that the size increase comes from side-effects,
> which have nothing to do with the diffing algorithm (like BDB log
> files, or the fact that BDB doesn't free empty space in its database).

It would be interesting to compare the results with fsfs (though 
Thorsten didn't specify which back-end was in use).

>> Or you just managed to hit a pathological case.
>
> That would be compressed or encrypted files, since those - by
> definition - shouldn't be much compressable anymore.

Exactly.  Though such things can easily be part of what you want to put 
in the repository.. e.g. MPEG files for game cut-scenes or some other 
form of compressed or encrypted data that would be part of a software 
project.  (Of course use of Subversion is not limited to software 
development either.)

Such files though shouldn't add much more to the used disk space than 
it would take for a complete copy of the new version of the file.  It 
certainly shouldn't be storing things as zillions of tiny diffs if it 
sees that the result is much larger than the original file size.


Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Howto prevent svn from storing deltas for binary files

Posted by Benjamin Pflugmann <be...@pflugmann.de>.
On Thu 2004-10-07 at 10:27:33 -0400, you wrote
> On Oct 7, 2004, at 5:35 AM, Thorsten.Huhn@bahn.de wrote:
> 
> >When doing a usuability test with svn I found that a library of
> >about 9 MB increased the repository size by about 24-28 MB.  My
> >conclusion is that a library as a result of a compilation might
> >change completely leading to a storage effort for tracing the
> >changes that is higher than original file itself.
> 
> Sounds like a broken binary-diff algorithm to me.

Much more probably is that the size increase comes from side-effects,
which have nothing to do with the diffing algorithm (like BDB log
files, or the fact that BDB doesn't free empty space in its database).

> Or you just managed to hit a pathological case.

That would be compressed or encrypted files, since those - by
definition - shouldn't be much compressable anymore.

Regards,

	Benjamin.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Howto prevent svn from storing deltas for binary files

Posted by Branko Čibej <br...@xbc.nu>.
Scott Palmer wrote:

> On Oct 7, 2004, at 5:35 AM, Thorsten.Huhn@bahn.de wrote:
>
>> When doing a usuability test with svn I found that a library of about 
>> 9 MB
>> increased the repository size by about 24-28 MB.
>> My conclusion is that a library as a result of a compilation might 
>> change
>> completely leading to a storage effort for tracing the changes that is
>> higher than original file itself.
>
>
> Sounds like a broken binary-diff algorithm to me.

Subversion will never store a delta if the delta is larger than the 
original data, which as you note can concievably happen for certain edge 
cases (although I've never seen a single case in real life, even with 
compressed files).

The increase of the database size is caused by BDB's page preallocation 
scheme and the fact that the database files aren't shrunk when data is 
logically removed from the database. If you try several similar commits 
one after another, you'll notice that the space overhead remains roughly 
constant -- actually close to the total size of the last commit.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Howto prevent svn from storing deltas for binary files

Posted by Scott Palmer <sc...@2connected.org>.
On Oct 7, 2004, at 5:35 AM, Thorsten.Huhn@bahn.de wrote:

> When doing a usuability test with svn I found that a library of about 
> 9 MB
> increased the repository size by about 24-28 MB.
> My conclusion is that a library as a result of a compilation might 
> change
> completely leading to a storage effort for tracing the changes that is
> higher than original file itself.

Sounds like a broken binary-diff algorithm to me.  Or you just managed 
to hit a pathological case.  But 3 times the data to represent the 
change is just really bad seeing how the whole point of storing diffs 
is to make the data smaller.


Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org