You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Helmut Zeisel <HZ...@gmx.at> on 2012/02/28 11:08:15 UTC

Subversion for object code

Our software products use different components, where many base components are used in different products. It is not necessary that all developpers compile these componentens themselves, that is, these components are distributed binary (as object files or libraries).
To make the distribution of binaries easier, some developers started to include also object files and libraries of the common base components in the subversion repository.

I know that subversion was not invented for that purpose, but from our experience it seems that it actually works.

My questions:

What kind of problems could occur if too many binaries are in the subversion archive?

How good does subversion make diffs of object code?

What better options for sharing versions of object fils are available?

Helmut
 

-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Daniel Shahaf <da...@elego.de>.

Helmut Zeisel wrote on Fri, Mar 02, 2012 at 10:28:16 +0100:
> 
> -------- Original-Nachricht --------
> > Datum: Thu, 1 Mar 2012 09:07:41 +0200
> > Von: Daniel Shahaf <da...@elego.de>
> 
> > Representation sharing works only on complete files.  (If two files are
> > not byte-for-byte identical, it never kicks in.)  What you see would be
> > the xdelta binary-diff algorithm being efficient.
> 
> OK. 
> 
> Does this mean that svn uses internally the same algorithm as xdelta (http://code.google.com/p/xdelta/)
> 

I don't know offhand how close our implementation is to other
implementations or to research papers.  There are some comments in
subversion/libsvn_delta/xdelta.c (lines 35 and 370) that suggest some
divergence from other sources.

CC'ing dev@ so people who know this code better can join the
conversation.

> This would mean that I could directly use 
> 
> xdelta3 -s SOURCE TARGET > OUT
> 
> to estimate how big the differences between binary files are (without creating some svn test repository and measuring repostiory size)
> 
> Helmut
> 
> -- 
> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Daniel Shahaf <da...@elego.de>.

Helmut Zeisel wrote on Fri, Mar 02, 2012 at 10:28:16 +0100:
> 
> -------- Original-Nachricht --------
> > Datum: Thu, 1 Mar 2012 09:07:41 +0200
> > Von: Daniel Shahaf <da...@elego.de>
> 
> > Representation sharing works only on complete files.  (If two files are
> > not byte-for-byte identical, it never kicks in.)  What you see would be
> > the xdelta binary-diff algorithm being efficient.
> 
> OK. 
> 
> Does this mean that svn uses internally the same algorithm as xdelta (http://code.google.com/p/xdelta/)
> 

I don't know offhand how close our implementation is to other
implementations or to research papers.  There are some comments in
subversion/libsvn_delta/xdelta.c (lines 35 and 370) that suggest some
divergence from other sources.

CC'ing dev@ so people who know this code better can join the
conversation.

> This would mean that I could directly use 
> 
> xdelta3 -s SOURCE TARGET > OUT
> 
> to estimate how big the differences between binary files are (without creating some svn test repository and measuring repostiory size)
> 
> Helmut
> 
> -- 
> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Helmut Zeisel <HZ...@gmx.at>.

-------- Original-Nachricht --------
> Datum: Thu, 1 Mar 2012 09:07:41 +0200
> Von: Daniel Shahaf <da...@elego.de>

> Representation sharing works only on complete files.  (If two files are
> not byte-for-byte identical, it never kicks in.)  What you see would be
> the xdelta binary-diff algorithm being efficient.

OK. 

Does this mean that svn uses internally the same algorithm as xdelta (http://code.google.com/p/xdelta/)

This would mean that I could directly use 

xdelta3 -s SOURCE TARGET > OUT

to estimate how big the differences between binary files are (without creating some svn test repository and measuring repostiory size)

Helmut

-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Daniel Shahaf <da...@elego.de>.

Helmut Zeisel wrote on Wed, Feb 29, 2012 at 10:41:13 +0100:
> My interpretation of this is that "representation-sharing" works well
> enough for compiled C++ code, i.e. if I change 2% of source code and
> commit the changed objects and libs, then typically a comparable
> amount (+/- 2%) of the object code is additionally added to the
> repostory and the remaining +/- 98% can be reused from previously
> committed object code.

Representation sharing works only on complete files.  (If two files are
not byte-for-byte identical, it never kicks in.)  What you see would be
the xdelta binary-diff algorithm being efficient.

Re: Subversion for object code

Posted by Helmut Zeisel <HZ...@gmx.at>.

> Datum: Tue, 28 Feb 2012 13:43:05 +0100
> Von: Stephen Butler <sb...@elego.de>

> > Acutually this "representation-sharing" was my question. How good does
> it work for compiled C++ code? How much does the repository typically grow?

> It's best to write a simple script that loads various versions of your
> own files into an empty repository.  Using the default FSFS backend,
> you can see the size of each commit.

OK, I now made some experiments with a typical library
(2 MB object code, 2MB library size) comiled with Visual C++.

Just completely recompiling the library (unchanged source) and commitiong again gives a repository growth of 7 KB (0,15% of the "commit size")

If I add one member function (to approx 40), which means that I also changed the header and had to recompile some other classes, and commit, the repository grows by 130 KB (3% of "commit size").

My interpretation of this is that "representation-sharing" works well enough for compiled C++ code, i.e. if I change 2% of source code and commit the changed objects and libs, then typically a comparable amount (+/- 2%) of the object code is additionally added to the repostory and the remaining +/- 98% can be reused from previously committed object code.

Helmut



-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Stephen Butler <sb...@elego.de>.

On Feb 28, 2012, at 12:35 , Helmut Zeisel wrote:

[...]

>> There's a quick summary of binary-file handling here:
>> 
>>  http://svnbook.red-bean.com/en/1.7/svn.forcvs.binary-and-trans.html
>> 
>> On the server side, Subversion stores files using a binary diff algorithm,
>> and has a "representation-sharing" feature for avoiding redundant data
>> storage.
> 
> Acutually this "representation-sharing" was my question. How good does it work for compiled C++ code? How much does the repository typically grow?

That's hard to say without knowing your parameters.  Most importantly, 
how compressible are the files?

I've seen mainly-text repositories (like Subversion's own Subversion
repository grow rather slowly.  I've seen a repository containing 
nearly-incompressible audio files grow at > 1 GB per week.

It's best to write a simple script that loads various versions of your
own files into an empty repository.  Using the default FSFS backend,
you can see the size of each commit.

Regards,
Steve

Re: Subversion for object code

Posted by Nico Kadel-Garcia <nk...@gmail.com>.

On Tue, Feb 28, 2012 at 6:35 AM, Helmut Zeisel <HZ...@gmx.at> wrote:

> > Datum: Tue, 28 Feb 2012 11:47:43 +0100
> > Von: Stephen Butler <sb...@elego.de>
>
> Thank you for the quick answer.
>
> > But it also has disadvantages:
> >
> > - Runaway repository growth.  Object files and .jar files don't compress
> > as well as text.  If you bust a hard limit for your repository disk
> space,
> > your IT service provider might force you to pay a drastic penalty.
>
> This is what I am afraid of; see also below ("representation-sharing")
>
This would be alleviated by the ability to "obliterate" material, which is
a longstanding feature request but has proven resistent. It seems to go
against the underlying philosophies, as well as being quite tricky to do.

Binaries don't store well, and svn "delete" doesn't really eliminate bulky
material, it just hides it. With this in mind, you might benefit from using
a clean "tags" structure, or a structure that actually dumps and loads tags
into a new source repo and builds and stores the binaries *there*, to
separate them from your main repository. This is beyond the scope of most
normal layouts, but could be invaluable for a regression testing structure
where you do need to be able to record altered binaries. By fragmenting
them from the main code line, you lose the ability to merge code changes
back to the main code lines, but for preserving reference binaries in a way
that can be used by svn:externals, it might be invaluable.




> > - Slower checkouts, updates, and merges due to working copy size
>
> To prevent this, we can make a suitable seperation of binaries and sources
> in the directory tree.
>
> > Also, you'll miss the features of language-specific dep-mgt tools, which
> > have a lot of sanity checks built in.  A few examples:
> >
> >   Java:  Maven
> >   Python:  virtualenv + pip
> >   Ruby:  bundler + gem
>
> We use C++.
>
> > > How good does subversion make diffs of object code?
> >
> > By default, 'svn diff' skips binary files.  You can customize it to use
> > another
> > program to display diffs for, say, "*.o" file.
>
> This we do not need.
>
> > There's a quick summary of binary-file handling here:
> >
> >   http://svnbook.red-bean.com/en/1.7/svn.forcvs.binary-and-trans.html
> >
> > On the server side, Subversion stores files using a binary diff
> algorithm,
> > and has a "representation-sharing" feature for avoiding redundant data
> > storage.
>
> Acutually this "representation-sharing" was my question. How good does it
> work for compiled C++ code? How much does the repository typically grow?
>
> > > What better options for sharing versions of object fils are available?
> >
> > That depends on your programming language.
>
> C++
>
> Helmut
> --
> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
>

Re: Subversion for object code

Posted by Helmut Zeisel <HZ...@gmx.at>.

> Datum: Tue, 28 Feb 2012 11:47:43 +0100
> Von: Stephen Butler <sb...@elego.de>

Thank you for the quick answer.

> But it also has disadvantages:
> 
> - Runaway repository growth.  Object files and .jar files don't compress 
> as well as text.  If you bust a hard limit for your repository disk space,
> your IT service provider might force you to pay a drastic penalty.

This is what I am afraid of; see also below ("representation-sharing")
 
> - Slower checkouts, updates, and merges due to working copy size

To prevent this, we can make a suitable seperation of binaries and sources in the directory tree. 

> Also, you'll miss the features of language-specific dep-mgt tools, which
> have a lot of sanity checks built in.  A few examples:
> 
>   Java:  Maven
>   Python:  virtualenv + pip
>   Ruby:  bundler + gem

We use C++.

> > How good does subversion make diffs of object code?
> 
> By default, 'svn diff' skips binary files.  You can customize it to use
> another
> program to display diffs for, say, "*.o" file.

This we do not need.

> There's a quick summary of binary-file handling here:
> 
>   http://svnbook.red-bean.com/en/1.7/svn.forcvs.binary-and-trans.html
> 
> On the server side, Subversion stores files using a binary diff algorithm,
> and has a "representation-sharing" feature for avoiding redundant data
> storage.

Acutually this "representation-sharing" was my question. How good does it work for compiled C++ code? How much does the repository typically grow?

> > What better options for sharing versions of object fils are available?
> 
> That depends on your programming language.

C++

Helmut
-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Les Mikesell <le...@gmail.com>.

On Tue, Feb 28, 2012 at 4:47 AM, Stephen Butler <sb...@elego.de> wrote:
> But it also has disadvantages:
>
> - Runaway repository growth.  Object files and .jar files don't compress
> as well as text.  If you bust a hard limit for your repository disk space,
> your IT service provider might force you to pay a drastic penalty.

The bigger issue here is that you have to 'svnadmin dump/filter/load'
the entire repository for any maintenance like removing large objects
that are committed accidentally or something that should not be there
for legal reasons.    This operation takes even more space and will
require more downtime as the repository grows. This problem becomes
even worse if you put multiple projects in a signal large repository.

> - Slower checkouts, updates, and merges due to working copy size

You can commit the libraries and executables only under tags (at a
slight break with convention) so they are only checked out by
externals when/where you specifically want them.

>> What better options for sharing versions of object fils are available?
> That depends on your programming language.

Which is unfortunate, because a lot of groups of developers use
multiple languages but want to share the same infrastructure.  So
component-level subversion versioning may be the best you can do.

-- 
   Les Mikesell
     lesmikesell@gmail.com

Re: Subversion for object code

Posted by Stephen Butler <sb...@elego.de>.

On Feb 28, 2012, at 11:08 , Helmut Zeisel wrote:

> Our software products use different components, where many base components are used in different products. It is not necessary that all developpers compile these componentens themselves, that is, these components are distributed binary (as object files or libraries).
> To make the distribution of binaries easier, some developers started to include also object files and libraries of the common base components in the subversion repository.
> 
> I know that subversion was not invented for that purpose, but from our experience it seems that it actually works.
> 
> My questions:
> 
> What kind of problems could occur if too many binaries are in the subversion archive?

Subversion handles binary files well.  It's commonly used to store the
"golden master" files of software releases.  

It sounds like you're using Subversion for dependency management,
too.  This has some advantages:

- Safe archiving (what's committed stays committed)

- Simple checkout procedure

- No extra admin required

But it also has disadvantages:

- Runaway repository growth.  Object files and .jar files don't compress 
as well as text.  If you bust a hard limit for your repository disk space, 
your IT service provider might force you to pay a drastic penalty.

- Slower checkouts, updates, and merges due to working copy size

Also, you'll miss the features of language-specific dep-mgt tools, which
have a lot of sanity checks built in.  A few examples:

  Java:  Maven
  Python:  virtualenv + pip
  Ruby:  bundler + gem

As a fallback, if you don't have an off-the-shelf dependency manager,
try Subversion externals.

  http://svnbook.red-bean.com/en/1.7/svn.advanced.externals.html

> 
> How good does subversion make diffs of object code?

By default, 'svn diff' skips binary files.  You can customize it to use another
program to display diffs for, say, "*.o" file.

There's a quick summary of binary-file handling here:

  http://svnbook.red-bean.com/en/1.7/svn.forcvs.binary-and-trans.html

On the server side, Subversion stores files using a binary diff algorithm,
and has a "representation-sharing" feature for avoiding redundant data
storage.

> 
> What better options for sharing versions of object fils are available?

That depends on your programming language.

Regards,
Steve

Re: Subversion for object code

Posted by Ulrich Eckhardt <ul...@dominolaser.com>.

Am 28.02.2012 14:42, schrieb Helmut Zeisel:
>> BTW: Another approach is to use a build server that e.g. runs
>> nightly and stores the results on a network share.
>
> This does not work in our situation because for some components we do
> not want to use the latest but some stable older version.

We have multiple releases here and the trunk. For the releases, we check 
out and compile from the tags. For the trunk, we check out the latest 
and greatest and compile all that. Needless to say, that now and then 
the trunk fails to build since someone forgot to compile for one or two 
targets or so, but that's exactly what this build server is intended to 
catch.

If I'm going to rework something in a way where I expect failures, I 
don't do so on the trunk but in a feature branch in order to isolate 
colleagues from the changes until they are stable.

We export the different versions on a share in the network, and some of 
those are also commited as binaries, especially when they go to a customer.

Uli
**************************************************************************************
Domino Laser GmbH, Fangdieckstra�e 75a, 22547 Hamburg, Deutschland
Gesch�ftsf�hrer: Thorsten F�cking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
Visit our website at http://www.dominolaser.com
**************************************************************************************
Diese E-Mail einschlie�lich s�mtlicher Anh�nge ist nur f�r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf�nger sein sollten. Die E-Mail ist in diesem Fall zu l�schen und darf weder gelesen, weitergeleitet, ver�ffentlicht oder anderweitig benutzt werden.
E-Mails k�nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte �nderungen enthalten. Domino Laser GmbH ist f�r diese Folgen nicht verantwortlich.
**************************************************************************************

Re: Subversion for object code

Posted by Helmut Zeisel <HZ...@gmx.at>.

> Datum: Tue, 28 Feb 2012 14:06:14 +0100
> Von: Ulrich Eckhardt <ul...@dominolaser.com>
> An: users@subversion.apache.org
> Betreff: Re: Subversion for object code


> > I know that subversion was not invented for that purpose
> 
> I tend to disagree a bit, Subversion was intended to version non-textual 
> resources, too, and you are not doing something that is unsupported and 
> discouraged and left without support either.
> 
> 
> > but from our experience it seems that it actually works.
> 
> Yes, I'd even call it common use, even though it's not the majority.

Quite interesting, these are good news.

> All files are binaries. The only issue here is the size they take, 
> because every change requires some storage in the repository.

Did someone already made measurements how big the additional storage consumption is for incremental commits of object files, e.g. created from C++ source?

> You probably already know the typical trunk/branches/tags hierarchy. 
> Keep this hierarchy, but also keep it clean of object files, because 
> merging and diffing them just doesn't work. Instead, add a fourth folder 
> with release packages (e.g. bin). 

OK, this we observe already (Actuallay we do not use fourth top-folder, but we have a clear seperation of source and bin at some other level in the hierarchy)

> BTW: Another approach is to use a build server that e.g. runs nightly 
> and stores the results on a network share. 

This does not work in our situation because for some components we do not want to use the latest but some stable older version.

Helmut
-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a

Re: Subversion for object code

Posted by Ulrich Eckhardt <ul...@dominolaser.com>.

Am 28.02.2012 11:08, schrieb Helmut Zeisel:
> Our software products use different components, where many base
> components are used in different products. It is not necessary that
> all developpers compile these componentens themselves, that is, these
> components are distributed binary (as object files or libraries). To
> make the distribution of binaries easier, some developers started to
> include also object files and libraries of the common base components
> in the subversion repository.
>
> I know that subversion was not invented for that purpose

I tend to disagree a bit, Subversion was intended to version non-textual 
resources, too, and you are not doing something that is unsupported and 
discouraged and left without support either.


> but from our experience it seems that it actually works.

Yes, I'd even call it common use, even though it's not the majority.


> My questions:
>
> What kind of problems could occur if too many binaries are in the
> subversion archive?

All files are binaries. The only issue here is the size they take, 
because every change requires some storage in the repository. All other 
issues like the history size and lookup time are probably negligible.


> How good does subversion make diffs of object code?

SVN can save network bandwidth by only transmitting partial deltas of 
any file. The problem is that e.g. compilers embed the compile date into 
their object files, so compiling something twice will cause the output 
to differ. If you then try to merge a change into that file, the part 
representing the date will surely fail, and barring any human-readable 
representation as for text files, you can't even resolve this conflict 
manually.

> What better options for sharing versions of object fils are
> available?

You probably already know the typical trunk/branches/tags hierarchy. 
Keep this hierarchy, but also keep it clean of object files, because 
merging and diffing them just doesn't work. Instead, add a fourth folder 
with release packages (e.g. bin). After making a tag, you commit a 
compiled version of that tag including all necessary binaries to the bin 
folder, so anyone can check out that folder and use the content without 
compiling. This keeps the non-mergable binaries separate but alows 
anyone that doesn't want to compile from scratch to just retrieve a 
usable setup.

In no case would I check in binaries as part of regular development, 
like in trunk or branches, as those changes just can't be merged. Also, 
mixing changes made by the developer and changes derived from those 
changes makes it harder to understand what exactly was done. In 
practice, you can even commit changes that simply don't correlate if you 
don't pay attention!

BTW: Another approach is to use a build server that e.g. runs nightly 
and stores the results on a network share. It starts with "rm -rf ..." 
and then checks out and builds each component from scratch. Components 
that don't build or just build on one developer's machine don't pass 
unnoticed that way. You could also mix the two, and only create release 
packages from those "official" builds of the build server once they 
passed the automated build an unit testing.

Good luck, I hope I gave you some good ideas!

Uli
**************************************************************************************
Domino Laser GmbH, Fangdieckstra�e 75a, 22547 Hamburg, Deutschland
Gesch�ftsf�hrer: Thorsten F�cking, Amtsgericht Hamburg HR B62 932
**************************************************************************************
Visit our website at http://www.dominolaser.com
**************************************************************************************
Diese E-Mail einschlie�lich s�mtlicher Anh�nge ist nur f�r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf�nger sein sollten. Die E-Mail ist in diesem Fall zu l�schen und darf weder gelesen, weitergeleitet, ver�ffentlicht oder anderweitig benutzt werden.
E-Mails k�nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte �nderungen enthalten. Domino Laser GmbH ist f�r diese Folgen nicht verantwortlich.
**************************************************************************************