You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Adrian Marsh <Ad...@ubiquisys.com> on 2007/08/20 11:29:24 UTC

Solving Large Repositories...

Hi All,

When a repos becomes very large, are there any "maintenance" procedures
to follow to archive away the "useless" data and reduce the size?

Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb, 1
user, 50 revisions.

Now in a few months/years time the original data of early revisions may
be worthless (say from revision 10 and before).
I'm guessing that I'd have to create a new respos, transfer 10+ into the
new one, and then re-point all the clients to the new repos - but that
sounds like a nightmare (clients being out of sync, having to re-check
out the data again etc).. 

Surely theres an easier way?

Adrian

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Solving Large Repositories...

Posted by David Kastrup <da...@gnu.org>.
"Adrian Marsh" <Ad...@ubiquisys.com> writes:

> Thanks Guys for the info.. I'll look into whatever "git" is as I've
> not heard of it before.

It is the version control system used for maintaining the Linux
kernel.  Somewhat rough around the edges, but quite efficient.

-- 
David Kastrup

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

RE: Re: Solving Large Repositories...

Posted by da...@jpmorgan.com.
Which goes back to the whole saga concerning the ability to remove objects 
from repositories which was discussed about a month ago.

http://svn.haxx.se/users/archive-2007-07/0573.shtml

Dg.
--
David Grierson
JPMorgan - IB Architecture - Source Code Management Consultant
GDP 228-5574 / DDI +44 141 228 5574 / Email david.x.grierson@jpmorgan.com
Sentinel House 2nd floor, 103 Waterloo Street, Glasgow G2 7BW
 



"Adrian Marsh" <Ad...@ubiquisys.com> 
23/08/2007 09:52

To
<us...@subversion.tigris.org>
cc

Subject
RE:  Re: Solving Large Repositories...






Thanks Guys for the info.. I'll look into whatever "git" is as I've not
heard of it before.

I found out that on one of my repos, an end user had started to upload a
Ghost image to the repos (12Gb), and then thought he'd "deleted" it when
he realized his mistake.. I got lucky this time, as I was able to
convince him he didn't need SVN for storing this file, and was able to
remove the repos completely...

Adrian Marsh
 

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: 21 August 2007 11:25
To: users@subversion.tigris.org
Subject: Re: Solving Large Repositories...

On Tue, Aug 21, 2007 at 11:12:16AM +0200, David Kastrup wrote:
> > On Aug 20, 2007, at 06:29, Adrian Marsh wrote:
> >
> >> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb,
1
> >> user, 50 revisions.

> Try git.  As an example, the git repository for Emacs has 89033
> revisions, a working tree size of 96MB, and a repository size of
> 170MB.  So for the price of two Subversion checkouts, you get the
> entire development history locally and can work with it off-line.

Apply this patch to git-svnimport before trying to convert
a 12GB large subversion repository to git:

http://marc.info/?l=git&m=118554191513822&w=2

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org



This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates. This transmission may contain information that is
privileged, confidential, legally privileged, and/or exempt from
disclosure under applicable law. If you are not the intended
recipient, you are hereby notified that any disclosure, copying,
distribution, or use of the information contained herein (including
any reliance thereon) is STRICTLY PROHIBITED. Although this
transmission and any attachments are believed to be free of any
virus or other defect that might affect any computer system into
which it is received and opened, it is the responsibility of the
recipient to ensure that it is virus free and no responsibility is
accepted by JPMorgan Chase & Co., its subsidiaries and affiliates,
as applicable, for any loss or damage arising in any way from its
use. If you received this transmission in error, please immediately
contact the sender and destroy the material in its entirety,
whether in electronic or hard copy format. Thank you. 
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

RE: Re: Solving Large Repositories...

Posted by Adrian Marsh <Ad...@ubiquisys.com>.
Thanks Guys for the info.. I'll look into whatever "git" is as I've not
heard of it before.

I found out that on one of my repos, an end user had started to upload a
Ghost image to the repos (12Gb), and then thought he'd "deleted" it when
he realized his mistake.. I got lucky this time, as I was able to
convince him he didn't need SVN for storing this file, and was able to
remove the repos completely...

Adrian Marsh
 

-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: 21 August 2007 11:25
To: users@subversion.tigris.org
Subject: Re: Solving Large Repositories...

On Tue, Aug 21, 2007 at 11:12:16AM +0200, David Kastrup wrote:
> > On Aug 20, 2007, at 06:29, Adrian Marsh wrote:
> >
> >> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb,
1
> >> user, 50 revisions.

> Try git.  As an example, the git repository for Emacs has 89033
> revisions, a working tree size of 96MB, and a repository size of
> 170MB.  So for the price of two Subversion checkouts, you get the
> entire development history locally and can work with it off-line.

Apply this patch to git-svnimport before trying to convert
a 12GB large subversion repository to git:

http://marc.info/?l=git&m=118554191513822&w=2

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Solving Large Repositories...

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Aug 21, 2007 at 11:12:16AM +0200, David Kastrup wrote:
> > On Aug 20, 2007, at 06:29, Adrian Marsh wrote:
> >
> >> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb, 1
> >> user, 50 revisions.

> Try git.  As an example, the git repository for Emacs has 89033
> revisions, a working tree size of 96MB, and a repository size of
> 170MB.  So for the price of two Subversion checkouts, you get the
> entire development history locally and can work with it off-line.

Apply this patch to git-svnimport before trying to convert
a 12GB large subversion repository to git:

http://marc.info/?l=git&m=118554191513822&w=2

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: Solving Large Repositories...

Posted by David Kastrup <da...@gnu.org>.
Ryan Schmidt <su...@ryandesign.com> writes:

> On Aug 20, 2007, at 06:29, Adrian Marsh wrote:
>
>> When a repos becomes very large, are there any "maintenance"
>> procedures
>> to follow to archive away the "useless" data and reduce the size?
>>
>> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb, 1
>> user, 50 revisions.
>>
>> Now in a few months/years time the original data of early revisions
>> may
>> be worthless (say from revision 10 and before).
>> I'm guessing that I'd have to create a new respos, transfer 10+
>> into the
>> new one, and then re-point all the clients to the new repos - but that
>> sounds like a nightmare (clients being out of sync, having to re-check
>> out the data again etc)..
>>
>> Surely theres an easier way?
>
> I don't think there's an easier way. Repositories are designed to
> store all data and never forget it. Coercing them to do otherwise is
> difficult, and somewhat on purpose.

Try git.  As an example, the git repository for Emacs has 89033
revisions, a working tree size of 96MB, and a repository size of
170MB.  So for the price of two Subversion checkouts, you get the
entire development history locally and can work with it off-line.

I am versioning software releases with git for some software of ours,
something like 300MB.  There have been about two dozen releases up to
now, and the repository is still smaller in size than a single
release.

As a downside, making git forget about checked-in material in a way
that actually releases the reclaimed disk space is very hard to do.
While there are tools allowing you to weed through a branch and remove
mistakenly checked in subdirectories, garbage-collecting afterwards
tends not to do anything impressive: git keeps a variety of "reflogs"
around that make it possible to recover from pretty much every mistake
by accessing branches, files, commits to any state in the last 90 days
or so.  It is quite complicated to make it forget about some piece of
local history for good right away.  It is probably easier cloning the
repository instead, and throwing the old one away.

-- 
David Kastrup

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Solving Large Repositories...

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Aug 20, 2007, at 06:29, Adrian Marsh wrote:

> When a repos becomes very large, are there any "maintenance"  
> procedures
> to follow to archive away the "useless" data and reduce the size?
>
> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb, 1
> user, 50 revisions.
>
> Now in a few months/years time the original data of early revisions  
> may
> be worthless (say from revision 10 and before).
> I'm guessing that I'd have to create a new respos, transfer 10+  
> into the
> new one, and then re-point all the clients to the new repos - but that
> sounds like a nightmare (clients being out of sync, having to re-check
> out the data again etc)..
>
> Surely theres an easier way?

I don't think there's an easier way. Repositories are designed to  
store all data and never forget it. Coercing them to do otherwise is  
difficult, and somewhat on purpose.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Solving Large Repositories...

Posted by Talden <ta...@gmail.com>.
- dump
- dump filter
- load

If you're concerned about space don't change the revision numbering or
any log messages and external documentation will refer to the wrong
revision.

Keep in mind that if your changes are typically evolutionary rather
than revolutionary then the repository will not grow as much in later
revisions as they have in the early revisions.

And consider that the capability of available infrastructure will
improve as well so unless the repository growth out-paces this you may
not need to eliminate history to manage repository size.

--
Talden


On 8/20/07, Adrian Marsh <Ad...@ubiquisys.com> wrote:
> Hi All,
>
> When a repos becomes very large, are there any "maintenance" procedures
> to follow to archive away the "useless" data and reduce the size?
>
> Eg, I've a repos, now 12Gb in size, but its working copy is 300Mb, 1
> user, 50 revisions.
>
> Now in a few months/years time the original data of early revisions may
> be worthless (say from revision 10 and before).
> I'm guessing that I'd have to create a new respos, transfer 10+ into the
> new one, and then re-point all the clients to the new repos - but that
> sounds like a nightmare (clients being out of sync, having to re-check
> out the data again etc)..
>
> Surely theres an easier way?
>
> Adrian
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org