You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by sureshkumar nandakumar <su...@gmail.com> on 2012/01/12 07:57:09 UTC

Space Constrain

Dear Expert

Our Subversion server is RedHat Linux.

We have lot of repositories which is maintaining in Linux server. Each
repositories taking huge size in our server.
Our Maximum size limit is 100GB, but the size almost reached 98%. We
are in trouble when we are using repository in Tortoise SVN.

We are getting space constrain issues. For temporary purpose we
deleting unused repositories in Server.
Even though the size in increasing daily basis.

Can anyone suggest me, how to save space. Is that any good way to keep
it SVN server without space constrain?
Is that any way to compress and reduce the repositories size without any impact?

Please advise me  with good practice.
Your suggestion is more use to me.

Re: Space Constrain

Posted by David Chapman <dc...@acm.org>.
On 1/11/2012 10:57 PM, sureshkumar nandakumar wrote:
> Dear Expert
>
> Our Subversion server is RedHat Linux.
>
> We have lot of repositories which is maintaining in Linux server. Each
> repositories taking huge size in our server.
> Our Maximum size limit is 100GB, but the size almost reached 98%. We
> are in trouble when we are using repository in Tortoise SVN.
>
> We are getting space constrain issues. For temporary purpose we
> deleting unused repositories in Server.
> Even though the size in increasing daily basis.
>
> Can anyone suggest me, how to save space. Is that any good way to keep
> it SVN server without space constrain?
> Is that any way to compress and reduce the repositories size without any impact?
>
> Please advise me  with good practice.
> Your suggestion is more use to me.
>
>

What are you storing that is so big?  Can you store only the inputs and 
methods used to generate each version of these large file, rather than 
the large files themselves?

-- 
     David Chapman         dcchapman@acm.org
     Chapman Consulting -- San Jose, CA
     Software Development Done Right.
     www.chapman-consulting-sj.com


Re: Space Constrain

Posted by Les Mikesell <le...@gmail.com>.
2012/1/13 Bob Archer <Bo...@amsi.com>:
>
> One issue we have is our legacy VB6 dll's that are built on every change. The dll's are put into source since most of our devs don't work on those binaries or can easily compile them. I have found that the bulk of our repo size is due to all these interim build versions. So, these are moved out of the primary source repository and put into a separate repo reference with externals. This repo can be replaced as it grows too big.

Yes, that is exactly the issue and I don't recall seeing any advice
about how to handle it or avoid the problem in the first place.  I
mostly put off doing anything about it through the subversion 1.4->1.6
revisions while 'obliterate' sounded like a possibility.  Now, that
doesn't look any closer to reality than it was years ago, so maybe
there should be some advice against getting into this situation
somewhere for people starting out.

> Now, source controlling external components is a judgement call. It might be better to just leave them in a public network folder and reference those locations in your source projects. Or, you can put them into source control, either the same repo or a separate one.

Our groups are very distributed and wouldn't all have access to a
common file share - and if they did it wouldn't be mapped the same for
everyone and wouldn't perform as well as using externals to pull the
copies in and update only on changes.  We like the functionality, just
not the practical issue on the repository side.

-- 
   Les Mikesell
     lesmikesell@gmail.com

RE: Space Constrain

Posted by Bob Archer <Bo...@amsi.com>.
> 2012/1/13 Thorsten Schöning <ts...@am-soft.de>:
> > Guten Tag Les Mikesell,
> > am Donnerstag, 12. Januar 2012 um 18:26 schrieben Sie:
> >
> >> We have a lot of component libraries that we want to include in
> >> larger projects without recompiling each build (i.e. we want to run
> >> known/tested instances) and have been including the binaries in tags
> >> so the headers and shared libs are versioned together.
> >
> > What's your problem at all? That you must version the same pre
> > compiled libraries as a tag in each of your projects?
> 
> It is a combination of things.  One is the long-ago decision to combine a large
> number of projects in the same repository.  The other is that our QA group
> wants to test a binary library component thoroughly, then make sure it is re-
> used instead of recompiled in each project that uses it.  Again, a decision made
> long ago for good reasons at the time, but perhaps less important now that we
> have more strictly-controlled build processes and environments.
> 
> > Are the
> > libraries such big that you run out of space?
> 
> Not strictly speaking.  That is we can deal with the overall disk space and that
> requirement won't change by breaking things up.
> However we are at a point where the time to complete a svnadmin dump/load
> cycle is becoming impractical.  I don't like a situation where we can't perform
> maintenance.
> 
> >> It''s clearly the wrong thing to do, but it works.
> >
> > I don't think so, if it saves you time and guarantees the use of
> > tested library versions, I would do the same. My in approach in my
> > company is to use a separate Repository for all kinds of libraries,
> > just version the source code and each developer has to build them on
> > his machine on it's own. The IDE just references the built libraries.
> > But we don't have that many libraries and whenever we can't build any
> > library on our own, I version them pre compiled, too.
> 
> Unconstrained growth just seems philosophically wrong - and unsustainable in
> the long run.  It might be tolerable if each component were in its own
> repository or if subversion had a reasonable approach to removing objects, but
> I can't change those things.
> 
> 
> > If one just copiess old library versions on updates etc. one can save
> > a lot of disk space.
> >
> >> How
> >> can you enforce getting exactly the right things in a parallel
> >> repository that has only the headers and libs that will work the same
> >> way for external references?
> >
> > Use tags and/or fixed revisions in your external definition.
> 
> Yes, we don't want to change the process of referencing known versions via svn
> externals in the upper level projects, we just want the binary objects and the
> necessary headers to be in a different repository.  If everything were java we
> would probably let maven handle the component object versioning, but we
> have a mix of projects.  We do use jenkins for most build activity, so a custom
> plugin to tag the build results might handle it without introducing mistakes.
> 

Externals can be pinned to a revision in your external repository. Although, it probably makes more sense to use well known paths so if you create a new repository you can duplicate the well known paths by just exporting your HEAD, deleting your repository, recreating it and importing the previous export. You can take this approach for internal and external binary components. Of course, if they are internal components you could include the source to them in your project and just build them with that project. 

One issue we have is our legacy VB6 dll's that are built on every change. The dll's are put into source since most of our devs don't work on those binaries or can easily compile them. I have found that the bulk of our repo size is due to all these interim build versions. So, these are moved out of the primary source repository and put into a separate repo reference with externals. This repo can be replaced as it grows too big.

Now, source controlling external components is a judgement call. It might be better to just leave them in a public network folder and reference those locations in your source projects. Or, you can put them into source control, either the same repo or a separate one. 

BOb





Re: Space Constrain

Posted by Les Mikesell <le...@gmail.com>.
2012/1/13 Thorsten Schöning <ts...@am-soft.de>:
> Guten Tag Les Mikesell,
> am Donnerstag, 12. Januar 2012 um 18:26 schrieben Sie:
>
>> We have a lot of component libraries that we
>> want to include in larger projects without recompiling each build
>> (i.e. we want to run known/tested instances) and have been including
>> the binaries in tags so the headers and shared libs are versioned
>> together.
>
> What's your problem at all? That you must version the same pre
> compiled libraries as a tag in each of your projects?

It is a combination of things.  One is the long-ago decision to
combine a large number of projects in the same repository.  The other
is that our QA group wants to test a binary library component
thoroughly, then make sure it is re-used instead of recompiled in each
project that uses it.  Again, a decision made long ago for good
reasons at the time, but perhaps less important now that we have more
strictly-controlled build processes and environments.

> Are the
> libraries such big that you run out of space?

Not strictly speaking.  That is we can deal with the overall disk
space and that requirement won't change by breaking things up.
However we are at a point where the time to complete a svnadmin
dump/load cycle is becoming impractical.  I don't like a situation
where we can't perform maintenance.

>> It''s clearly the wrong thing to do, but it works.
>
> I don't think so, if it saves you time and guarantees the use of
> tested library versions, I would do the same. My in approach in my
> company is to use a separate Repository for all kinds of libraries,
> just version the source code and each developer has to build them on
> his machine on it's own. The IDE just references the built libraries.
> But we don't have that many libraries and whenever we can't build any
> library on our own, I version them pre compiled, too.

Unconstrained growth just seems philosophically wrong - and
unsustainable in the long run.  It might be tolerable if each
component were in its own repository or if subversion had a reasonable
approach to removing objects, but I can't change those things.

> If one just copiess old library versions on updates etc. one can save
> a lot of disk space.
>
>> How
>> can you enforce getting exactly the right things in a parallel
>> repository that has only the headers and libs that will work the same
>> way for external references?
>
> Use tags and/or fixed revisions in your external definition.

Yes, we don't want to change the process of referencing known versions
via svn externals in the upper level projects, we just want the binary
objects and the necessary headers to be in a different repository.  If
everything were java we would probably let maven handle the component
object versioning, but we have a mix of projects.  We do use jenkins
for most build activity, so a custom plugin to tag the build results
might handle it without introducing mistakes.

-- 
   Les Mikesell
     lesmikesell@gmail.com

Re: Space Constrain

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag Les Mikesell,
am Donnerstag, 12. Januar 2012 um 18:26 schrieben Sie:

> We have a lot of component libraries that we
> want to include in larger projects without recompiling each build
> (i.e. we want to run known/tested instances) and have been including
> the binaries in tags so the headers and shared libs are versioned
> together.

What's your problem at all? That you must version the same pre
compiled libraries as a tag in each of your projects? Are the
libraries such big that you run out of space?

> It''s clearly the wrong thing to do, but it works.

I don't think so, if it saves you time and guarantees the use of
tested library versions, I would do the same. My in approach in my
company is to use a separate Repository for all kinds of libraries,
just version the source code and each developer has to build them on
his machine on it's own. The IDE just references the built libraries.
But we don't have that many libraries and whenever we can't build any
library on our own, I version them pre compiled, too.

If one just copiess old library versions on updates etc. one can save
a lot of disk space.

> How
> can you enforce getting exactly the right things in a parallel
> repository that has only the headers and libs that will work the same
> way for external references?

Use tags and/or fixed revisions in your external definition.

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon.............030-2 1001-310
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hanover HRB 207 694 - Geschäftsführer: Andreas Muchow


Re: Space Constrain

Posted by Les Mikesell <le...@gmail.com>.
On Fri, Jan 13, 2012 at 12:27 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 1:15 AM, Les Mikesell <le...@gmail.com> wrote:
>>
>> Not what I want.   I want a central canonical svn repository with
>> tagged versions of matching headers and shared libs that can be
>> predictably accessed with svn externals, I just don't want it to be
>> the same svn repository that holds the source until subversion gets
>> some features that make it feasible to remove things.  But I'd like a
>> way to ensure that the tags stay  precisely in parallel to the
>> matching source.  I don't see  how git would help with that part.
>
> You run the testing cycle on a local git working copy, where local
> changes can be made, recorded, and discarded. You then push the
> binaries, when needed, to the Subversion repository.

Local to what/who?  These are components that need to be accessible
among different groups that use subversion as the access mechanism.

-- 
  Les Mikesell
    lesmikesell@gmail.com

Re: Space Constrain

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Fri, Jan 13, 2012 at 1:15 AM, Les Mikesell <le...@gmail.com> wrote:
> On Fri, Jan 13, 2012 at 12:06 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
>> On Thu, Jan 12, 2012 at 12:26 PM, Les Mikesell <le...@gmail.com> wrote:
>>
>>>> Another option is to store binaries in a separate repository that you can archive and recreate monthly or quarterly, or whatever. Then you can use externals in your projects to reference them.
>>>>
>>>
>>> Is there a 'best practices" kind of writeup on how to do this
>>> correctly anywhere?   We have a lot of component libraries that we
>>> want to include in larger projects without recompiling each build
>>> (i.e. we want to run known/tested instances) and have been including
>>> the binaries in tags so the headers and shared libs are versioned
>>> together.  It''s clearly the wrong thing to do, but it works.   How
>>> can you enforce getting exactly the right things in a parallel
>>> repository that has only the headers and libs that will work the same
>>> way for external references?
>>
>> You use git, which supports tracking local changes without verbosely
>> propagating them to the central, canonical repository. This especially
>> applies to testing binaries, and can be integrated with the git/svn
>> toolkit to propagate to a more familar and existing central
>> repository.
>
> Not what I want.   I want a central canonical svn repository with
> tagged versions of matching headers and shared libs that can be
> predictably accessed with svn externals, I just don't want it to be
> the same svn repository that holds the source until subversion gets
> some features that make it feasible to remove things.  But I'd like a
> way to ensure that the tags stay  precisely in parallel to the
> matching source.  I don't see  how git would help with that part.

You run the testing cycle on a local git working copy, where local
changes can be made, recorded, and discarded. You then push the
binaries, when needed, to the Subversion repository.

Most binaries in an auto-build or development environment are not
worth keeping. This allows you to publish only those binaries that you
*want* to be reference binaries, in a more flexible fashion than most
Subverson repositories, especially because the working local branches
and tags need never be published to the main repo and clutter it up.

I've used this successfully for environments where a central
Subversion reository was mandated by policy, history, or the desire
for centralized source control.

Re: Space Constrain

Posted by Les Mikesell <le...@gmail.com>.
On Fri, Jan 13, 2012 at 12:06 AM, Nico Kadel-Garcia <nk...@gmail.com> wrote:
> On Thu, Jan 12, 2012 at 12:26 PM, Les Mikesell <le...@gmail.com> wrote:
>
>>> Another option is to store binaries in a separate repository that you can archive and recreate monthly or quarterly, or whatever. Then you can use externals in your projects to reference them.
>>>
>>
>> Is there a 'best practices" kind of writeup on how to do this
>> correctly anywhere?   We have a lot of component libraries that we
>> want to include in larger projects without recompiling each build
>> (i.e. we want to run known/tested instances) and have been including
>> the binaries in tags so the headers and shared libs are versioned
>> together.  It''s clearly the wrong thing to do, but it works.   How
>> can you enforce getting exactly the right things in a parallel
>> repository that has only the headers and libs that will work the same
>> way for external references?
>
> You use git, which supports tracking local changes without verbosely
> propagating them to the central, canonical repository. This especially
> applies to testing binaries, and can be integrated with the git/svn
> toolkit to propagate to a more familar and existing central
> repository.

Not what I want.   I want a central canonical svn repository with
tagged versions of matching headers and shared libs that can be
predictably accessed with svn externals, I just don't want it to be
the same svn repository that holds the source until subversion gets
some features that make it feasible to remove things.  But I'd like a
way to ensure that the tags stay  precisely in parallel to the
matching source.  I don't see  how git would help with that part.

-- 
   Les Mikesell
     lesmikesell@gmail.com

Re: Space Constrain

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Thu, Jan 12, 2012 at 12:26 PM, Les Mikesell <le...@gmail.com> wrote:
> On Thu, Jan 12, 2012 at 9:20 AM, Bob Archer <Bo...@amsi.com> wrote:
>>
>>> Please advise me  with good practice.
>>> Your suggestion is more use to me.
>>
>> I think the main way to keep repos small is to NOT put binary files in it. Of course, depending on your usage that may not be practical. I think the majority opinion is hard drives are cheap.
>>
>> I know some people here have recommended some binary versioning systems that only maintains a certain number of versions back and delete older ones. I don't recall the names. Someone else can chime in with one or two.
>>
>> You could also implement something like that yourself with a build script. Store your binaries in a folder tree with a "latest" that is a symlink of the most recent version of the binaries. This way your references and such don't need to change for every version.
>>
>> Another option is to store binaries in a separate repository that you can archive and recreate monthly or quarterly, or whatever. Then you can use externals in your projects to reference them.
>>
>
> Is there a 'best practices" kind of writeup on how to do this
> correctly anywhere?   We have a lot of component libraries that we
> want to include in larger projects without recompiling each build
> (i.e. we want to run known/tested instances) and have been including
> the binaries in tags so the headers and shared libs are versioned
> together.  It''s clearly the wrong thing to do, but it works.   How
> can you enforce getting exactly the right things in a parallel
> repository that has only the headers and libs that will work the same
> way for external references?

You use git, which supports tracking local changes without verbosely
propagating them to the central, canonical repository. This especially
applies to testing binaries, and can be integrated with the git/svn
toolkit to propagate to a more familar and existing central
repository.

Re: Space Constrain

Posted by Les Mikesell <le...@gmail.com>.
On Thu, Jan 12, 2012 at 9:20 AM, Bob Archer <Bo...@amsi.com> wrote:
>
>> Please advise me  with good practice.
>> Your suggestion is more use to me.
>
> I think the main way to keep repos small is to NOT put binary files in it. Of course, depending on your usage that may not be practical. I think the majority opinion is hard drives are cheap.
>
> I know some people here have recommended some binary versioning systems that only maintains a certain number of versions back and delete older ones. I don't recall the names. Someone else can chime in with one or two.
>
> You could also implement something like that yourself with a build script. Store your binaries in a folder tree with a "latest" that is a symlink of the most recent version of the binaries. This way your references and such don't need to change for every version.
>
> Another option is to store binaries in a separate repository that you can archive and recreate monthly or quarterly, or whatever. Then you can use externals in your projects to reference them.
>

Is there a 'best practices" kind of writeup on how to do this
correctly anywhere?   We have a lot of component libraries that we
want to include in larger projects without recompiling each build
(i.e. we want to run known/tested instances) and have been including
the binaries in tags so the headers and shared libs are versioned
together.  It''s clearly the wrong thing to do, but it works.   How
can you enforce getting exactly the right things in a parallel
repository that has only the headers and libs that will work the same
way for external references?

-- 
   Les Mikesell
      lesmikesell@gmail.com

RE: Space Constrain

Posted by Bob Archer <Bo...@amsi.com>.
> Our Subversion server is RedHat Linux.
> 
> We have lot of repositories which is maintaining in Linux server. Each
> repositories taking huge size in our server.
> Our Maximum size limit is 100GB, but the size almost reached 98%. We are in
> trouble when we are using repository in Tortoise SVN.
> 
> We are getting space constrain issues. For temporary purpose we deleting
> unused repositories in Server.
> Even though the size in increasing daily basis.
> 
> Can anyone suggest me, how to save space. Is that any good way to keep it SVN
> server without space constrain?
> Is that any way to compress and reduce the repositories size without any
> impact?
> 
> Please advise me  with good practice.
> Your suggestion is more use to me.

I think the main way to keep repos small is to NOT put binary files in it. Of course, depending on your usage that may not be practical. I think the majority opinion is hard drives are cheap.

I know some people here have recommended some binary versioning systems that only maintains a certain number of versions back and delete older ones. I don't recall the names. Someone else can chime in with one or two.

You could also implement something like that yourself with a build script. Store your binaries in a folder tree with a "latest" that is a symlink of the most recent version of the binaries. This way your references and such don't need to change for every version. 

Another option is to store binaries in a separate repository that you can archive and recreate monthly or quarterly, or whatever. Then you can use externals in your projects to reference them.

BOb





Re: Space Constrain

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Jan 12, 2012, at 01:56, Thorsten Schöning wrote:

>> Is that any way to compress and reduce the repositories size without any impact?

The repository is already stored compressed. Newer versions of Subversion store revisions in repositories more efficiently, but will not rewrite old revisions stored by older versions. For maximum space savings, dump and load, as Thorsten said below; this way, all revisions will be stored in the most efficient format.


> Depending on your current repository format, the repository can be
> packed and repository sharing can be used, some kind of deduplication
> which can reduce repository sizes if a lot of comparable or even
> identical files are checked in. To get maximum benefit of this a
> complete dump and load cycle of your repository is needed.
> 
> http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.diskspace

The feature is called *representation* sharing, by the way.




Re: Space Constrain

Posted by Thorsten Schöning <ts...@am-soft.de>.
Guten Tag sureshkumar nandakumar,
am Donnerstag, 12. Januar 2012 um 07:57 schrieben Sie:

> We have lot of repositories which is maintaining in Linux server. Each
> repositories taking huge size in our server.
> Our Maximum size limit is 100GB, but the size almost reached 98%. We
> are in trouble when we are using repository in Tortoise SVN.

Who enforces space limit and what does it has to do with TortoiseSVN?
What is saved in your repository, which files types with which average
sizes? Which subversion version are your running as a server and in
which format are your current repositories?

> Is that any good way to keep
> it SVN server without space constrain?

It mainly depends on your hardware. What impact has the increasing size
on your users? Do they have to wait "forever" for any operation? Or
what is your real problem with the increasing sizes, just backup
problems?

> Is that any way to compress and reduce the repositories size without any impact?

Depending on your current repository format, the repository can be
packed and repository sharing can be used, some kind of deduplication
which can reduce repository sizes if a lot of comparable or even
identical files are checked in. To get maximum benefit of this a
complete dump and load cycle of your repository is needed.

http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.diskspace

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail:Thorsten.Schoening@AM-SoFT.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon.............030-2 1001-310
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hanover HRB 207 694 - Geschäftsführer: Andreas Muchow