You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Matthew England <me...@mengland.net> on 2004/11/12 23:42:03 UTC

Any problems w/ very large (10-100MB) binary files?

Has anyone tried controlling/storing/managing very large (as in 10MB to 
100MB) binary files in Subversion?

Has anyone had any problems doing so?

-Matt

ps: Sorry if this is a FAQ. It's hard to get a narrow search on "large 
files." ;)  


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Neil Martin <sv...@orangeappeal.co.uk>.
For disk saving purposes I would like to be able to extract the history of a
file (or all files) in the repos to a file that it backupable. Also to be
able to insert the extracted file back so that the history of the file(s) is
recoverable (if need be).
Is it possible to find out how much space a file is using in the repos?

----- Original Message ----- 
From: "Glenn Maynard" <g_...@zewt.org>
To: <us...@subversion.tigris.org>
Sent: Saturday, November 13, 2004 11:25 AM
Subject: Re: Any problems w/ very large (10-100MB) binary files?


> On Fri, Nov 12, 2004 at 09:28:34PM -0600, Ben Collins-Sussman wrote:
>> You're right.  Glenn's objection is that SVN cannot easily lose data,
>> while CVS can.
>
> Not quite.  My objection is that SVN can't delete data, while CVS can.
> Losing data and deleting data are different things.
>
> My filesystem, running XFS, has never lost data.  I delete files every
> day.  Being able to delete data while never losing data are orthogonal.
>
> It seems to me that the "don't lose data" design consideration was taken
> a little too far, and "lose data" ended up including "removing data
> intentionally".
>
> Anyhow, I understand this not being a high priority; I'm just registering
> myself periodically as somebody who could really use this.  :)
>
> -- 
> Glenn Maynard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by kf...@collab.net.
Glenn Maynard <g_...@zewt.org> writes:
> Not quite.  My objection is that SVN can't delete data, while CVS can.
> Losing data and deleting data are different things.
> 
> My filesystem, running XFS, has never lost data.  I delete files every
> day.  Being able to delete data while never losing data are orthogonal.
> 
> It seems to me that the "don't lose data" design consideration was taken
> a little too far, and "lose data" ended up including "removing data
> intentionally".
> 
> Anyhow, I understand this not being a high priority; I'm just registering
> myself periodically as somebody who could really use this.  :)

The reason we haven't implemented 'svn obliterate' is solely technical
difficulty -- it's hard to do.  It is not a philosophical position
against deleting data, it's just a question of priorities.

Hope this helps clarify things,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Glenn Maynard <g_...@zewt.org>.
On Fri, Nov 12, 2004 at 09:28:34PM -0600, Ben Collins-Sussman wrote:
> You're right.  Glenn's objection is that SVN cannot easily lose data, 
> while CVS can.

Not quite.  My objection is that SVN can't delete data, while CVS can.
Losing data and deleting data are different things.

My filesystem, running XFS, has never lost data.  I delete files every
day.  Being able to delete data while never losing data are orthogonal.

It seems to me that the "don't lose data" design consideration was taken
a little too far, and "lose data" ended up including "removing data
intentionally".

Anyhow, I understand this not being a high priority; I'm just registering
myself periodically as somebody who could really use this.  :)

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Nov 12, 2004, at 9:10 PM, Matthew England wrote:

> Hi Ben,
>
> Thanks for the background info.  That seems to be good to know.
>
> At 11/12/2004 08:57 PM, Ben Collins-Sussman wrote:
>> Glenn's objection is that an RCS file is 'hackable' by hand... that 
>> you can manually hack out old versions of a file (if you really know 
>> what you're doing.)  A subversion repository has no way of losing 
>> data, ever... it's a database that's not hackable.  The only recourse 
>> is to [dump | filter | reload].
>
> I'm reading Glenn's "objection" differently.

Whoops, I misworded my response.  ;-(

You're right.  Glenn's objection is that SVN cannot easily lose data, 
while CVS can.

So yes, if you plan to keep things under version control and want 
occasionally purge history now and then, Subversion is definitely not 
the tool for you.  It's possible with SVN, but not easy.  (At least, 
not as easy as hacking an RCS file.)  We plan to address this someday, 
but have no concrete plans yet.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Matthew England <me...@mengland.net>.
Hi Ben,

Thanks for the background info.  That seems to be good to know.

At 11/12/2004 08:57 PM, Ben Collins-Sussman wrote:
>Glenn's objection is that an RCS file is 'hackable' by hand... that you 
>can manually hack out old versions of a file (if you really know what 
>you're doing.)  A subversion repository has no way of losing data, ever... 
>it's a database that's not hackable.  The only recourse is to [dump | 
>filter | reload].

I'm reading Glenn's "objection" differently.  I'm hearing the "larger 
picture" as the inability to delete unneeded revisions of a file.  While in 
many ways it sounds good that SVN will never "lose" data/revs...in some 
cases it's bad.

eg, let's say I'm rev-controlling a lot of content/collateral for a 
product-training class.  Lots of documents, powerpoints, and even large 
video files.  Some of these video files are HUGE.  50MB or bigger, some of 
them.  If I have multiple revs of each one of those stacked up, all of a 
sudden my server's storage capacity is taking a big byte (and maybe SVN 
performance does, too?).

I'd like to go selectively delete some of the older revs of the video files 
for I'm safe that I'll never need them again.  They've almost completely 
changed in content from one rev to the next (for various reasons: movie 
resolution change, a reshoot, etc)--even though they are still addressing 
the same thing and should keep the same name and logical revisioning scheme.

This is the problem I want to solve....and it's what I hear Glenn saying, 
too.  Maybe I'm simply not understanding things (either Glenn or the 
SVN-specific stuff) correctly?

-Matt  


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by "Mihai T. Lazarescu" <mi...@pobox.com>.
On Fri, 12 Nov 2004, Matthew England wrote:

> My vote would be to first have a per-file-revision-deletion mechanism.
>
> I'm still wondering: can CVS do this?  If it can and Subversion can not, I 
> might need to go get CVS fired up on my server (which I've never done 
> before).  (Maybe Ben/Glenn already answered this and I'm simply not 
> understanding things...I'm just trying to  make sure before I haul off and 
> install a CVS server.)

Excerpt of `info cvs' command:

     admin options
     -------------
     [...]
     `-oRANGE'
          Deletes ("outdates") the revisions given by RANGE.

          Note that this command can be quite dangerous unless you know
          _exactly_ what you are doing (for example see the warnings below
          about how the REV1:REV2 syntax is confusing).

          If you are short on disc this option might help you.  But think
          twice before using it--there is no way short of restoring the
          latest backup to undo this command!  If you delete different
          revisions than you planned, either due to carelessness or (heaven
          forbid) a CVS bug, there is no opportunity to correct the error
          before the revisions are deleted.  It probably would be a good
          idea to experiment on a copy of the repository first.
     [...]

Cheers,

Mihai

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Andre Majorel <ay...@teaser.fr>.
On 2004-11-12 23:48 -0500, Christopher Ness wrote:
> On Fri, 2004-11-12 at 22:15, Matthew England wrote:
> > However, I probably would not use it in the example I mention in
> > this thread.  In such a case, I would want to delete *some* of the
> > video files, but I would still want to keep others besides the head
> > revision.
> 
> Have you thought of creating a read only directory and making each of
> your revisions read only as well.  You could even put a log file in the
> directory.
> 
> It's old technology, but for videos and large files that cannot be
> "diff'd" the only advantage I can see of a Revision Control System (RCS)
> is to keep metadata about the files.
> 
> I think you need to question your design and goals for your project. 
> What are you trying to achieve by putting these files in a RCS?

He's trying to use Subversion to implement a partially versioned
filesystem on a platform that does not support the concept at
the OS level. What's wrong with that ?

-- 
André Majorel <URL:http://www.teaser.fr/~amajorel/>
Do not use this account for regular correspondence.
See the URL above for contact information.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Christopher Ness <ch...@nesser.org>.
On Fri, 2004-11-12 at 22:15, Matthew England wrote:
> However, I probably would not use it in the example I mention in this 
> thread.  In such a case, I would want to delete *some* of the video files, 
> but I would still want to keep others besides the head revision.

Have you thought of creating a read only directory and making each of
your revisions read only as well.  You could even put a log file in the
directory.

It's old technology, but for videos and large files that cannot be
"diff'd" the only advantage I can see of a Revision Control System (RCS)
is to keep metadata about the files.

I think you need to question your design and goals for your project. 
What are you trying to achieve by putting these files in a RCS?

Cheers,
Chris Ness
-- 
Software Engineering IV,
McMaster University
PGP Public Key: http://www.nesser.org/pgp-key/
23:37:05 up 14:25, 5 users, load average: 0.09, 0.10, 0.15 
http://www.fsf.org/philosophy/no-word-attachments.html

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Matthew England <me...@mengland.net>.
At 11/12/2004 09:11 PM, Trent Mick wrote:
>Perforce has an option to "only store head revision of a file" (by
>setting the +S modify on a file's type). This can be useful for
>certain types of large files -- obviously only if one doesn't care
>about recovering old revisions.

That's an interesting feature.  I can see that useful in some cases.

However, I probably would not use it in the example I mention in this 
thread.  In such a case, I would want to delete *some* of the video files, 
but I would still want to keep others besides the head revision.

>Is there a feature request for this for Subversion?

My vote would be to first have a per-file-revision-deletion mechanism.

I'm still wondering: can CVS do this?  If it can and Subversion can not, I 
might need to go get CVS fired up on my server (which I've never done 
before).  (Maybe Ben/Glenn already answered this and I'm simply not 
understanding things...I'm just trying to  make sure before I haul off and 
install a CVS server.)

-Matt 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Trent Mick <tr...@gmail.com>.
Perforce has an option to "only store head revision of a file" (by
setting the +S modify on a file's type). This can be useful for
certain types of large files -- obviously only if one doesn't care
about recovering old revisions.

Is there a feature request for this for Subversion?

Cheers,
Trent

-- 
Trent Mick
trentm@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Nov 12, 2004, at 8:29 PM, Matthew England wrote:

>
>> For my use, this is the only serious drawback of SVN compared to 
>> CVS--serious
>> enough that I'm forced to keep CVS around for these files, 
>> unfortunately ...
>
> So CVS apparently handles these "undiffable" files (or simply large 
> binary files) differently?   How so?  Does CVS allow one to delete 
> unneeded revisions of a file (when Subversion does not as per your 
> description below)?

When you commit a new version of a binary file to CVS,

   - the client sends the entire file to the server
   - the server stores the entire file.

If you look at the corresponding RCS file in the CVS repository, it's 
just a concatenation of every full version of the file.

When you commit a new version of *any* file to SVN (text or binary, svn 
doesn't distinguish),

   - the client sends only diffs to the server
   - the server stores the diffs (details vary, depending on BDB vs. 
FSFS back-end)

But as Glenn said, the problem is that sometimes a binary file is 
already compressed, or changing the binary file results in nearly every 
byte in the file being altered.  When that happens, Subversion ends up 
doing just as much work as CVS.  The 'diff' being sent is as big as the 
file itself, and so is the diff being stored.

So in general, SVN is more efficient than CVS when it comes to 
transmitting and storing binary files.  In certain edge cases, it's 
equally inefficient.  It's never worse.

Glenn's objection is that an RCS file is 'hackable' by hand... that you 
can manually hack out old versions of a file (if you really know what 
you're doing.)  A subversion repository has no way of losing data, 
ever... it's a database that's not hackable.  The only recourse is to 
[dump | filter | reload].


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Matthew England <me...@mengland.net>.
>For my use, this is the only serious drawback of SVN compared to CVS--serious
>enough that I'm forced to keep CVS around for these files, unfortunately ...

So CVS apparently handles these "undiffable" files (or simply large binary 
files) differently?   How so?  Does CVS allow one to delete unneeded 
revisions of a file (when Subversion does not as per your description below)?

At 11/12/2004 06:07 PM, Glenn Maynard wrote:
>There's no way to selectively remove unneeded revisions of a file from
>the repository, short of dumping the db and hacking out the revisions
>(which is dangerous, requires downtime, and requires a lot of free disk
>space).

Ouch.

Does the fsfs flavor of Subversion make this any easier, by any chance?

-Matt 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Any problems w/ very large (10-100MB) binary files?

Posted by Glenn Maynard <g_...@zewt.org>.
On Fri, Nov 12, 2004 at 05:42:03PM -0600, Matthew England wrote:
> Has anyone tried controlling/storing/managing very large (as in 10MB to 
> 100MB) binary files in Subversion?
> 
> Has anyone had any problems doing so?

The only problem I've had is that, if the files are undiffable (eg.
compressed files, which tend to change completely when modified), each
commit causes the repository to unrecoverably grow by the size of the file.
There's no way to selectively remove unneeded revisions of a file from
the repository, short of dumping the db and hacking out the revisions
(which is dangerous, requires downtime, and requires a lot of free disk
space).

For my use, this is the only serious drawback of SVN compared to CVS--serious
enough that I'm forced to keep CVS around for these files, unfortunately ...

-- 
Glenn Maynard

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org