You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@alice-dsl.de> on 2011/12/27 01:30:38 UTC

Directory deltification

As of r1224839, FSFS now supports directory deltification.
Please review the changes and run tests against different
repositories so that we get a better idea of what the costs
and benefits are. As soon as I'm back home, I will run tests
against the Apache and KDE repositories.

So far, I ran tests against the rather small TSVN repository.
It seems that we get 50% more capacity / 33% size savings
for 0 .. 20% CPU overhead. The savings should be more
significant on larger repositories and some of the extra
overhead should be removed by the file handle caching
code - once merged into /trunk.

In any case, we are still much faster than 1.6.
Detailed results can be found in the attached document.
"/trunk" is at r1224828 (i.e. without deltification tuning.

-- Stefan^2.

Re: Directory deltification

Posted by Stefan Fuhrmann <st...@alice-dsl.de>.
On 27.12.2011 14:20, Daniel Shahaf wrote:
> Stefan Fuhrmann wrote on Tue, Dec 27, 2011 at 01:30:38 +0100:
>> As of r1224839, FSFS now supports directory deltification.
> If you haven't seen it already, I opened an issue around this just recently:
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=4084
I'm aware of that. That discussion was based
on outdated information and the only reasonable
way to address this issue is by providing updated
information.

-- Stefan^2.

Re: Directory deltification

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Stefan Fuhrmann wrote on Tue, Dec 27, 2011 at 01:30:38 +0100:
> As of r1224839, FSFS now supports directory deltification.

If you haven't seen it already, I opened an issue around this just recently:

http://subversion.tigris.org/issues/show_bug.cgi?id=4084


> Please review the changes and run tests against different
> repositories so that we get a better idea of what the costs
> and benefits are. As soon as I'm back home, I will run tests
> against the Apache and KDE repositories.
> 
> So far, I ran tests against the rather small TSVN repository.
> It seems that we get 50% more capacity / 33% size savings
> for 0 .. 20% CPU overhead. The savings should be more
> significant on larger repositories and some of the extra
> overhead should be removed by the file handle caching
> code - once merged into /trunk.
> 
> In any case, we are still much faster than 1.6.
> Detailed results can be found in the attached document.
> "/trunk" is at r1224828 (i.e. without deltification tuning.
> 
> -- Stefan^2.

> Ubuntu 11.10, 64 bit (ssd), packed repository
> 
> repository size
> 
> 1.6.12               414.9MB   (100%)
> trunk                414.9MB   (100%)
> trunk+diff           275.9MB   ( 66%)
> 
> svnadmin load -q (4.3GB non-deltified dump file)
> 
> 1.6.12               5m25.625s (156%)
> trunk                3m29.225s (100%)
> trunk -M1000         3m 1.455s ( 87%)
> trunk+diff           4m 1.313s (115%)
> trunk+diff -M1000    3m23.496s ( 97%)
> 
> svnadmin verify -q
> 
> 1.6.12              68m21.244s (3877%)
> trunk                1m45.786s (100%)
> trunk -M1000         1m18.207s ( 74%)
> trunk+diff           2m10.199s (123%)
> trunk+diff -M1000    1m36.188s ( 91%)
> 
> svnserve -dT (addtional flags for trunk: -c0)
> svn export svn://localhost/repo/trunk -q --ignore-externals (cold)
> (85MB, 3840 items)
> 
> 1.6.12                 18.223s (883%)
> trunk                   2.063s (100%)
> trunk -M1000            2.888s (140%)
> trunk+diff              2.607s (126%)
> trunk+diff -M1000       2.691s (130%)
> 
> svn export svn://localhost/repo/trunk -q --ignore-externals (hot)
> 
> 1.6.12                 17.743s (841%)
> trunk                   2.111s (100%)
> trunk -M1000            0.901s ( 43%)
> trunk+diff              2.412s (114%)
> trunk+diff -M1000       0.994s ( 47%)
> 
> svn ls svn://localhost/repo/tags (cold)
> (accuracy +/- 2ms)
> 
> 1.6.12                    66ms (125%)
> trunk                     53ms (100%)
> trunk -M1000             128ms (242%)
> trunk+diff                52ms ( 98%)
> trunk+diff -M1000        123ms (232%)
> 
> svn ls svn://localhost/repo/tags (hot)
> (accuracy +/- 2ms)
> 
> 1.6.12                    66ms (140%)
> trunk                     47ms (100%)
> trunk -M1000              49ms (104%)
> trunk+diff                45ms ( 96%)
> trunk+diff -M1000         47ms (100%)
> 
> 


Re: Directory deltification

Posted by Philip Martin <ph...@wandisco.com>.
Hyrum K Wright <hy...@wandisco.com> writes:

> On Tue, Jan 3, 2012 at 8:23 AM, Philip Martin
> <ph...@wandisco.com> wrote:
>> I've been testing this with the old CollabNet Subversion repository, the
>> first 40,515 Subversion revisions, on my Linux laptop:
>>
>> The db/revs directory (unpacked) is 320MB instead of 490MB.
>> Loading takes about 12% more CPU.
>> Dumping takes about 22% more CPU.
>>
>> which matches your results.  Packing removes about 85MB for both
>> repositories.
>>
>> There are operations where reading the directory representations is more
>> dominant.  'svn log' on a path inside the repository uses 100% more CPU.
>
> Is that peak CPU or overall?  If the I/O overhead went down, I'd
> expect the peak CPU usage to go up, but the overall operation time to
> drop.

That's overall CPU used, so the runtime for 'svn log' doubles when the
repository is in RAM.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com

Re: Directory deltification

Posted by Hyrum K Wright <hy...@wandisco.com>.
On Tue, Jan 3, 2012 at 8:23 AM, Philip Martin
<ph...@wandisco.com> wrote:
> Stefan Fuhrmann <st...@alice-dsl.de> writes:
>
>> As of r1224839, FSFS now supports directory deltification.
>> Please review the changes and run tests against different
>> repositories so that we get a better idea of what the costs
>> and benefits are. As soon as I'm back home, I will run tests
>> against the Apache and KDE repositories.
>>
>> So far, I ran tests against the rather small TSVN repository.
>> It seems that we get 50% more capacity / 33% size savings
>> for 0 .. 20% CPU overhead.
>
> I've been testing this with the old CollabNet Subversion repository, the
> first 40,515 Subversion revisions, on my Linux laptop:
>
> The db/revs directory (unpacked) is 320MB instead of 490MB.
> Loading takes about 12% more CPU.
> Dumping takes about 22% more CPU.
>
> which matches your results.  Packing removes about 85MB for both
> repositories.
>
> There are operations where reading the directory representations is more
> dominant.  'svn log' on a path inside the repository uses 100% more CPU.

Is that peak CPU or overall?  If the I/O overhead went down, I'd
expect the peak CPU usage to go up, but the overall operation time to
drop.

(FWIW, I'm in the midst of loading a copy of the ASF repo using the
new code, and I'll let folks know the results.)

-Hyrum

-- 

uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/

Re: Directory deltification

Posted by Stefan Fuhrmann <eq...@web.de>.
On 03.01.2012 15:23, Philip Martin wrote:
> Stefan Fuhrmann<st...@alice-dsl.de>  writes:
>
>> As of r1224839, FSFS now supports directory deltification.
>> Please review the changes and run tests against different
>> repositories so that we get a better idea of what the costs
>> and benefits are. As soon as I'm back home, I will run tests
>> against the Apache and KDE repositories.
>>
>> So far, I ran tests against the rather small TSVN repository.
>> It seems that we get 50% more capacity / 33% size savings
>> for 0 .. 20% CPU overhead.
> I've been testing this with the old CollabNet Subversion repository, the
> first 40,515 Subversion revisions, on my Linux laptop:
>
> The db/revs directory (unpacked) is 320MB instead of 490MB.
> Loading takes about 12% more CPU.
> Dumping takes about 22% more CPU.
>
> which matches your results.  Packing removes about 85MB for both
> repositories.
Thanks for testing!

Packed Apache repo is 29 vs. 41GB.
Packed KDE repo is 39 vs. 69GB.

I also noticed that "svndump dump | svndump load"
is much faster than "svnsync file:// file://". My guess
is that revprop changes are somehow expensive
(even on a RAM disk).
> There are operations where reading the directory representations is more
> dominant.  'svn log' on a path inside the repository uses 100% more CPU.
>
Good finding. I should find some time to look into
this in the next two months or so. My guess is that
combining deltas is the critical operation here.
In that case, almost the whole overhead can be
eliminated.

-- Stefan^2.



Re: Directory deltification

Posted by Philip Martin <ph...@wandisco.com>.
Stefan Fuhrmann <st...@alice-dsl.de> writes:

> As of r1224839, FSFS now supports directory deltification.
> Please review the changes and run tests against different
> repositories so that we get a better idea of what the costs
> and benefits are. As soon as I'm back home, I will run tests
> against the Apache and KDE repositories.
>
> So far, I ran tests against the rather small TSVN repository.
> It seems that we get 50% more capacity / 33% size savings
> for 0 .. 20% CPU overhead.

I've been testing this with the old CollabNet Subversion repository, the
first 40,515 Subversion revisions, on my Linux laptop:

The db/revs directory (unpacked) is 320MB instead of 490MB.
Loading takes about 12% more CPU.
Dumping takes about 22% more CPU.

which matches your results.  Packing removes about 85MB for both
repositories.

There are operations where reading the directory representations is more
dominant.  'svn log' on a path inside the repository uses 100% more CPU.

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com