You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by qu...@in-euro.de on 2004/10/13 12:17:52 UTC

Is FSFS compressing stored data?

Hello,

I tested a bit the differences between FSFS-repos and BDB-repos. I found that
FSFS-repos must be compressed. Is this true?
I found no information about it in chapter 5 of the book. Btw FSFS-repos where
slight slower than BDB-repos in my tests. For example a small project only 8.5
MB was imported. As result I got 6s by BDB and 13 by FSFS, but the size was 5.8
MB for FSFS and 13 MB for BDB. The project was a mixture of dirs, sources,
binaries and tars.
If FSFS compresses files, is it possible to make it in BDB to?

Greets

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
Scott Palmer wrote:
> 
> On Oct 13, 2004, at 8:05 PM, Greg Hudson wrote:
> 
>>> They both compress.  The reasons BDB is bigger are complex, and have
>>> to do with how databases allocate storage.
>>
>>
>> Er, no, BDB stores the head revision in uncompressed plaintext,
>> whereas FSFS stores all file revisions as deltas against something.
>> (The first rev of a file is stored as a delta against empty.)
>>
> 
> How does FSFS store the HEAD revision then?  If it needs to process 
> deltas to get the HEAD revision that would lead to performance problems 
> would it not?  I was under the impression that the usually way to store 
> versions is to use reverse deltas from the HEAD revision to get the 
> older revision, so generally the HEAD revision is stored verbatim (as 
> you say it is for BDB).

It uses skip deltas, so it only needs to apply at most log(n) deltas to 
get to any given revision.  It's a little bit slower, but not noticable 
in my experience.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Scott Palmer <sc...@2connected.org>.
On Oct 13, 2004, at 8:05 PM, Greg Hudson wrote:

>> They both compress.  The reasons BDB is bigger are complex, and have
>> to do with how databases allocate storage.
>
> Er, no, BDB stores the head revision in uncompressed plaintext,
> whereas FSFS stores all file revisions as deltas against something.
> (The first rev of a file is stored as a delta against empty.)
>

How does FSFS store the HEAD revision then?  If it needs to process 
deltas to get the HEAD revision that would lead to performance problems 
would it not?  I was under the impression that the usually way to store 
versions is to use reverse deltas from the HEAD revision to get the 
older revision, so generally the HEAD revision is stored verbatim (as 
you say it is for BDB).

Scott


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Mark Phippard <Ma...@softlanding.com>.
Greg Hudson <gh...@MIT.EDU> wrote on 10/14/2004 11:55:21 AM:

> > I believe Greg's point is that the presence of these full-texts, 
> > especially if you have a lot of branches, accounts for a lot of the 
> > difference in the repository sizes.  Also, if I understand Greg, even 
the 
> > head revision in fsfs undergoes some degree of compression, such as 
> > stripping out whitespace etc...
> 
> Stripping out whitespace?  No.
> 
> In FSFS, the first rev of a file is stored as a delta against empty,
> which is a form of compression comparable to gzip (though not quite as
> good).  Later revs of the file are stored as deltas against earlier
> revs--but not always the most immediate earlier rev.  See
> http://svn.collab.net/repos/svn/trunk/notes/skip-deltas for details.

To us mortals, a delta against empty would imply the same end result as a 
full text.  I know you didn't mean that, so I was suggesting that there is 
some degree of compression even in that scenario. 

What I meant by stripping out whitespace, is that if I had 100 consecutive 
spaces the algorithm would probably compress that down to something far 
less than 100 bytes.  I have no idea what the delta algorithm, or gzip, 
actually does, I just thought that might be something that us lay persons 
might understand.

Thanks

Mark



_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Greg Hudson <gh...@MIT.EDU>.
On Thu, 2004-10-14 at 10:34, Mark Phippard wrote:
> kfogel@newton.ch.collab.net wrote on 10/14/2004 08:37:08 AM:
> > Sure, the head revision is uncompressed.  But the vast majority of
> > data in the repository is compressed, and therefore I think it's fair,
> > when speaking in general terms, to say that the repository is
> > compressed.  (Unless the question was specifically about head, which I
> > may have missed.)

The historical data in the repository may not be "the vast majority of
the data" in some repositories.  For instance, my work repository is
primarily an integration repository--imports of external source code
with a few local mods.  Although we do import new versions of the
external source code periodically (thus creating history), the ratio of
head data to history is quite large.

> I believe Greg's point is that the presence of these full-texts, 
> especially if you have a lot of branches, accounts for a lot of the 
> difference in the repository sizes.  Also, if I understand Greg, even the 
> head revision in fsfs undergoes some degree of compression, such as 
> stripping out whitespace etc...

Stripping out whitespace?  No.

In FSFS, the first rev of a file is stored as a delta against empty,
which is a form of compression comparable to gzip (though not quite as
good).  Later revs of the file are stored as deltas against earlier
revs--but not always the most immediate earlier rev.  See
http://svn.collab.net/repos/svn/trunk/notes/skip-deltas for details.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Mark Phippard <Ma...@softlanding.com>.
kfogel@newton.ch.collab.net wrote on 10/14/2004 08:37:08 AM:

> Greg Hudson <gh...@MIT.EDU> writes:
> > > They both compress.  The reasons BDB is bigger are complex, and have
> > > to do with how databases allocate storage.
> > 
> > Er, no, BDB stores the head revision in uncompressed plaintext,
> > whereas FSFS stores all file revisions as deltas against something.
> > (The first rev of a file is stored as a delta against empty.)
> 
> Sure, the head revision is uncompressed.  But the vast majority of
> data in the repository is compressed, and therefore I think it's fair,
> when speaking in general terms, to say that the repository is
> compressed.  (Unless the question was specifically about head, which I
> may have missed.)

I believe Greg's point is that the presence of these full-texts, 
especially if you have a lot of branches, accounts for a lot of the 
difference in the repository sizes.  Also, if I understand Greg, even the 
head revision in fsfs undergoes some degree of compression, such as 
stripping out whitespace etc...

Mark

_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by kf...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:
> > They both compress.  The reasons BDB is bigger are complex, and have
> > to do with how databases allocate storage.
> 
> Er, no, BDB stores the head revision in uncompressed plaintext,
> whereas FSFS stores all file revisions as deltas against something.
> (The first rev of a file is stored as a delta against empty.)

Sure, the head revision is uncompressed.  But the vast majority of
data in the repository is compressed, and therefore I think it's fair,
when speaking in general terms, to say that the repository is
compressed.  (Unless the question was specifically about head, which I
may have missed.)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by Greg Hudson <gh...@MIT.EDU>.
> They both compress.  The reasons BDB is bigger are complex, and have
> to do with how databases allocate storage.

Er, no, BDB stores the head revision in uncompressed plaintext,
whereas FSFS stores all file revisions as deltas against something.
(The first rev of a file is stored as a delta against empty.)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Is FSFS compressing stored data?

Posted by kf...@collab.net.
quastst@in-euro.de writes:
> I tested a bit the differences between FSFS-repos and BDB-repos. I
> found that FSFS-repos must be compressed. Is this true?  I found no
> information about it in chapter 5 of the book. Btw FSFS-repos where
> slight slower than BDB-repos in my tests. For example a small project
> only 8.5 MB was imported. As result I got 6s by BDB and 13 by FSFS,
> but the size was 5.8 MB for FSFS and 13 MB for BDB. The project was a
> mixture of dirs, sources, binaries and tars.  If FSFS compresses
> files, is it possible to make it in BDB to?

They both compress.  The reasons BDB is bigger are complex, and have
to do with how databases allocate storage.

By the way, when you make a new post to the mailing list, please don't
do it by hitting Reply to an existing post and then changing the
subject line.  The mailer still remembers what thread you replied to,
and so your post shows up as being part of that thread.

   http://subversion.tigris.org/mailing-list-guidelines.html#fresh-post

has more on this.

Thanks,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org