You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Philip Martin <ph...@wandisco.com> on 2012/01/03 15:23:48 UTC

Re: Directory deltification

Stefan Fuhrmann <st...@alice-dsl.de> writes:

> As of r1224839, FSFS now supports directory deltification.
> Please review the changes and run tests against different
> repositories so that we get a better idea of what the costs
> and benefits are. As soon as I'm back home, I will run tests
> against the Apache and KDE repositories.
>
> So far, I ran tests against the rather small TSVN repository.
> It seems that we get 50% more capacity / 33% size savings
> for 0 .. 20% CPU overhead.

I've been testing this with the old CollabNet Subversion repository, the
first 40,515 Subversion revisions, on my Linux laptop:

The db/revs directory (unpacked) is 320MB instead of 490MB.
Loading takes about 12% more CPU.
Dumping takes about 22% more CPU.

which matches your results.  Packing removes about 85MB for both
repositories.

There are operations where reading the directory representations is more
dominant.  'svn log' on a path inside the repository uses 100% more CPU.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Directory deltification

Posted by Philip Martin <ph...@wandisco.com>.
Hyrum K Wright <hy...@wandisco.com> writes:

> On Tue, Jan 3, 2012 at 8:23 AM, Philip Martin
> <ph...@wandisco.com> wrote:
>> I've been testing this with the old CollabNet Subversion repository, the
>> first 40,515 Subversion revisions, on my Linux laptop:
>>
>> The db/revs directory (unpacked) is 320MB instead of 490MB.
>> Loading takes about 12% more CPU.
>> Dumping takes about 22% more CPU.
>>
>> which matches your results.  Packing removes about 85MB for both
>> repositories.
>>
>> There are operations where reading the directory representations is more
>> dominant.  'svn log' on a path inside the repository uses 100% more CPU.
>
> Is that peak CPU or overall?  If the I/O overhead went down, I'd
> expect the peak CPU usage to go up, but the overall operation time to
> drop.

That's overall CPU used, so the runtime for 'svn log' doubles when the
repository is in RAM.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Directory deltification

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Tue, Jan 3, 2012 at 8:23 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Stefan Fuhrmann <st...@alice-dsl.de> writes:
>
>> As of r1224839, FSFS now supports directory deltification.
>> Please review the changes and run tests against different
>> repositories so that we get a better idea of what the costs
>> and benefits are. As soon as I'm back home, I will run tests
>> against the Apache and KDE repositories.
>>
>> So far, I ran tests against the rather small TSVN repository.
>> It seems that we get 50% more capacity / 33% size savings
>> for 0 .. 20% CPU overhead.
>
> I've been testing this with the old CollabNet Subversion repository, the
> first 40,515 Subversion revisions, on my Linux laptop:
>
> The db/revs directory (unpacked) is 320MB instead of 490MB.
> Loading takes about 12% more CPU.
> Dumping takes about 22% more CPU.
>
> which matches your results.  Packing removes about 85MB for both
> repositories.
>
> There are operations where reading the directory representations is more
> dominant.  'svn log' on a path inside the repository uses 100% more CPU.

Is that peak CPU or overall?  If the I/O overhead went down, I'd
expect the peak CPU usage to go up, but the overall operation time to
drop.

(FWIW, I'm in the midst of loading a copy of the ASF repo using the
new code, and I'll let folks know the results.)

-Hyrum

-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: Directory deltification

Posted by Stefan Fuhrmann <eq...@web.de>.
On 03.01.2012 15:23, Philip Martin wrote:
> Stefan Fuhrmann<st...@alice-dsl.de>  writes:
>
>> As of r1224839, FSFS now supports directory deltification.
>> Please review the changes and run tests against different
>> repositories so that we get a better idea of what the costs
>> and benefits are. As soon as I'm back home, I will run tests
>> against the Apache and KDE repositories.
>>
>> So far, I ran tests against the rather small TSVN repository.
>> It seems that we get 50% more capacity / 33% size savings
>> for 0 .. 20% CPU overhead.
> I've been testing this with the old CollabNet Subversion repository, the
> first 40,515 Subversion revisions, on my Linux laptop:
>
> The db/revs directory (unpacked) is 320MB instead of 490MB.
> Loading takes about 12% more CPU.
> Dumping takes about 22% more CPU.
>
> which matches your results.  Packing removes about 85MB for both
> repositories.
Thanks for testing!

Packed Apache repo is 29 vs. 41GB.
Packed KDE repo is 39 vs. 69GB.

I also noticed that "svndump dump | svndump load"
is much faster than "svnsync file:// file://". My guess
is that revprop changes are somehow expensive
(even on a RAM disk).
> There are operations where reading the directory representations is more
> dominant.  'svn log' on a path inside the repository uses 100% more CPU.
>
Good finding. I should find some time to look into
this in the next two months or so. My guess is that
combining deltas is the critical operation here.
In that case, almost the whole overhead can be
eliminated.

-- Stefan^2.