You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Glenn Maynard <g_...@zewt.org> on 2004/03/22 22:55:58 UTC

Disk space recovery

We have a somewhat large tree, consisting of source, text data and binary
data.  We've been managing this with CVS.

Some of the binary data is large enough that it simply takes too much space
to keep versioning indefinitely.  We don't actually need versioning for this
data; it's just convenient to store, download and synchronize the data along
with everything else.

(These are mostly MPEG/Ogg files, which don't tend to diff well.)

When commits to the larger binary files start taking too much space, I simply
go around with "cvs admin -o" on the larger files, wiping out old revisions
to clear up space.  Although this is annoying (as it's not self-maintaining),
it works fine: space is recovered, working copies are unaffected, and other
files (where we do want versioning) are unaffected.

I'd like to have the repository migrated to svn, but I can't find any way to
prevent these binary files from killing our disk space.

I've tried playing with svnadmin dump/load, by dumping the head revision
and loading it.  This hit several problems:  I have to make an interim copy
of the repository; we don't have the space to store two whole copies at once.
It confuses working copies, since the revision starts over at 1.  It loses all
versioning, not just specific files.

I'm guessing most of this, except for the copy problem, could be worked around
by filtering svn dumps: replace the original revision of the file with the
latest, and remove all future changes.  I can't think of a safe way around the
copy problem with dump/load, and manually filtering dumps isn't all too safe
either (since I have no experience with them).

Is there a sane way to do what I need, or am I still stuck with CVS for now?

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Ben Collins-Sussman <su...@collab.net>.
On Mon, 2004-03-22 at 16:55, Glenn Maynard wrote:

> Is there a sane way to do what I need

Nope.  The *only* way for the svn repository to lose data is to
dump/filter/reload.  Sorry.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Jim Correia <ji...@pobox.com>.
On Mar 22, 2004, at 6:46 PM, Neil Gower wrote:

> The "erase history" feature seems a useful feature, maybe it could be 
> wish-listed for a future version of svn?

Looks like this is already covered by Issue #516 in the issue tracker:

	<http://subversion.tigris.org/issues/show_bug.cgi?id=516>

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Andreas Kostyrka <an...@kostyrka.org>.
On Mon, Mar 22, 2004 at 07:51:48PM -0500, Glenn Maynard wrote:
> On Mon, Mar 22, 2004 at 06:46:53PM -0500, Neil Gower wrote:
> > Hopefully, this won't be as big an issue for svn users, due to the whole 
> > binary deltas thing.
> 
> That doesn't help with files that change fundamentally; for example, a video
> that's been resized, or an audio sample that's had its volume changed.  These
> are the types of revisions these files go through in the repository I'm using.
> 
> > The "erase history" feature seems a useful feature, maybe it could be 
> > wish-listed for a future version of svn?
> 
> A property roughly saying "don't keep history" would be extremely useful,
> and self-maintaining.  I simply don't want versioning for certain files.
Then why are you keeping it in the svn repository? I mean that would destroy
one of the axioms of svn, that it can recover the complete filesystem layout
for any given release, ...

Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Corrin Lakeland <la...@cs.otago.ac.nz>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 23 Mar 2004 15:32, Ben Collins-Sussman wrote:
> Glenn Maynard wrote:
> > A property roughly saying "don't keep history" would be extremely useful,
> > and self-maintaining.  I simply don't want versioning for certain files.
>
> Then why put them in a version control system?  :-)

Because they're needed in order to run the project perhaps?  I have a similar 
setup and find it irritating that I must svn checkout my repository as well 
as scp over my data files (8.5G).

Corrin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFAX7qli5A0ZsG8x8cRAk9BAJ9Ef8g9StL4gJHEtDIWJQNf57mGSQCgitVA
2it1NBkqKyPupE+lleCU8+k=
=BoZu
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Disk space recovery

Posted by Glenn Maynard <g_...@zewt.org>.
On Tue, Mar 23, 2004 at 03:56:33PM +0200, Nuutti Kotivuori wrote:
> Remember that Subversion works based on diffs sent over the
> network. So if some of your users have an older working copy and wish
> to get up to date, the repository must have the older version to
> compare against it. Changing the system to send full-texts in these
> cases wouldn't be a minor change.

Especially if the "don't version" is, itself, a versioned property.

"svn obliterate" would have to deal with this, too: if a WC has revision
1, and we delete that revision, "svn update" on that copy would have to
receive a full copy.

> Some day "svn obliterate" will come in to existence, and hopefully it
> will support removing only a subset of versions on a file.

Well, not just removing: replacing with the current (or the next).  For
example, if a file has two revisions, 1 and 2, and we remove 1, it should
actually replace 1 with 2.  That way, if we look at revision 1 of the
repository, the file isn't missing; we simply see the latest revision,
instead of what was really there at the time.  (The issue of branches
needing to be rewritten wouldn't affect me, at least; we won't be branching
this data.)

Also, I hope "svn obliterate" won't need to write out a second copy of
the entire db; that'd have the same problem as svnadmin dump/filter/load:
requiring a lot of free disk space, making it useless for disk space
recovery.

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Nuutti Kotivuori <na...@iki.fi>.
Glenn Maynard wrote:
> It's our main repository of files.  Versioning is important for many
> of the files, but other functions are important for all files:
> distributing data to everyone working on the project, and keeping
> everyone up to date with everyone else, which are things CVS and SVN
> are both very good at.

On a conceptual level, just not versioning some files seems rather
straightforward, but in practise, it's going to be hard to implement.

Remember that Subversion works based on diffs sent over the
network. So if some of your users have an older working copy and wish
to get up to date, the repository must have the older version to
compare against it. Changing the system to send full-texts in these
cases wouldn't be a minor change.

Some day "svn obliterate" will come in to existence, and hopefully it
will support removing only a subset of versions on a file.

I myself am faced with the same problem soon - and will probably solve
it with just a separate file area that is rsync'd or just mounted.

-- Naked


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Martin Tomes <li...@tomes.org.uk>.
Glenn Maynard wrote:

> On Mon, Mar 22, 2004 at 09:32:26PM -0600, Ben Collins-Sussman wrote:
> 
>>>A property roughly saying "don't keep history" would be extremely useful,
>>>and self-maintaining.  I simply don't want versioning for certain files.
>>Then why put them in a version control system?  :-)
> It's our main repository of files.  Versioning is important for many of the
> files, but other functions are important for all files: distributing data to
> everyone working on the project, and keeping everyone up to date with
> everyone else, which are things CVS and SVN are both very good at.

We are facing exactly the same problem.  Everyone in our project needs 
up to date binaries but they can't all build them, currently they are 
checked into CVS.  Our plan is to create a script which copies these 
binaries into either a file share or a WebDAV share and another script 
which copied them back out again so the builder of these things can 
update the shared copy and the users can get them back into the right place.

Another possibility is to use rsync to put/get the unversioned binaries.

We will also have a versioned release tree containing binaries which 
have been released into the wild.

-- 
Martin Tomes
echo 'Martin x Tomes at controls x eurotherm x co x uk'\
  | sed -e 's/ x /\./g' -e 's/ at /@/'

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Glenn Maynard <g_...@zewt.org>.
On Thu, Mar 25, 2004 at 10:25:36AM +0100, Andreas Kostyrka wrote:
> But did you consider how your proposition would work out:

For the cases you asked for, actually, yes.  (I didn't discuss them
because there doesn't appear to be interest in this feature right now
from anyone that could implement it.)

> 1) What to do with with svn -r 100 repopath?
>    That one is easy, so I try to answer it: Just give r110 version of nohistory
> 2) What about revision 20? I mean the directory where nohistory is stored
>    doesn't even exist.

If the file exists at that revision, supply the file; otherwise, don't.
Committing the file and then running "svn diff -rPREV:HEAD" on it would
show no changes.

These cases are easy; there are probably lots of much harder ones, and
plenty of internal implementation details that would make doing this
difficult.  That's why what I gave is a suggestion, not a detailed
feature proposal; I don't know SVN internals well enough to do that.

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Disk space recovery

Posted by Andreas Kostyrka <an...@kostyrka.org>.
On Mon, Mar 22, 2004 at 11:17:42PM -0500, Glenn Maynard wrote:
> On Mon, Mar 22, 2004 at 09:32:26PM -0600, Ben Collins-Sussman wrote:
> > >A property roughly saying "don't keep history" would be extremely useful,
> > >and self-maintaining.  I simply don't want versioning for certain files.
> > 
> > Then why put them in a version control system?  :-)
> 
> It's our main repository of files.  Versioning is important for many of the
> files, but other functions are important for all files: distributing data to
> everyone working on the project, and keeping everyone up to date with
> everyone else, which are things CVS and SVN are both very good at.
But did you consider how your proposition would work out:
a) svn works on filesets, not files.
b) all repositories start out empty at r0.

Now consider:
r0: start
r50: mkdir /path/to/bigfiles
r100: add /path/to/bigfiles/nohistory
r110: change /path/to/bigfiles/nohistory

Now you do not want nohistory to store history information ;)

Ok, answer the following questions:

1) What to do with with svn -r 100 repopath?
   That one is easy, so I try to answer it: Just give r110 version of nohistory
2) What about revision 20? I mean the directory where nohistory is stored
   doesn't even exist.

Andreas

> 
> -- 
> Glenn Maynard
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Glenn Maynard <g_...@zewt.org>.
On Mon, Mar 22, 2004 at 09:32:26PM -0600, Ben Collins-Sussman wrote:
> >A property roughly saying "don't keep history" would be extremely useful,
> >and self-maintaining.  I simply don't want versioning for certain files.
> 
> Then why put them in a version control system?  :-)

It's our main repository of files.  Versioning is important for many of the
files, but other functions are important for all files: distributing data to
everyone working on the project, and keeping everyone up to date with
everyone else, which are things CVS and SVN are both very good at.

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Ben Collins-Sussman <su...@collab.net>.
Glenn Maynard wrote:

> A property roughly saying "don't keep history" would be extremely useful,
> and self-maintaining.  I simply don't want versioning for certain files.

Then why put them in a version control system?  :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Glenn Maynard <g_...@zewt.org>.
On Mon, Mar 22, 2004 at 06:46:53PM -0500, Neil Gower wrote:
> Hopefully, this won't be as big an issue for svn users, due to the whole 
> binary deltas thing.

That doesn't help with files that change fundamentally; for example, a video
that's been resized, or an audio sample that's had its volume changed.  These
are the types of revisions these files go through in the repository I'm using.

> The "erase history" feature seems a useful feature, maybe it could be 
> wish-listed for a future version of svn?

A property roughly saying "don't keep history" would be extremely useful,
and self-maintaining.  I simply don't want versioning for certain files.

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Disk space recovery

Posted by Neil Gower <ne...@digitalextremes.com>.
Glenn Maynard wrote:

> When commits to the larger binary files start taking too much space, I simply
> go around with "cvs admin -o" on the larger files, wiping out old revisions
> to clear up space.  Although this is annoying (as it's not self-maintaining),
> it works fine: space is recovered, working copies are unaffected, and other
> files (where we do want versioning) are unaffected.
> 
> I'd like to have the repository migrated to svn, but I can't find any way to
> prevent these binary files from killing our disk space.

Hopefully, this won't be as big an issue for svn users, due to the whole 
binary deltas thing.

On the other hand, if you keep them around long enough, I'm sure the 
history of binary files will still get big with svn.  I do the same 
thing with our CVS repository (admin -o), after backing up to offline 
storage.

The "erase history" feature seems a useful feature, maybe it could be 
wish-listed for a future version of svn?


Neil.




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org