You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2009/04/24 09:58:11 UTC

Re: svnadmin verify performance issue (was: Re: How long do your svn dumps take)

On Thu, Apr 23, 2009 at 21:08, Stefan Sperling <st...@elego.de> wrote:
> On Thu, Apr 23, 2009 at 01:33:08PM -0500, kmradke@rockwellcollins.com wrote:
>> kmradke@rockwellcollins.com wrote on 04/22/2009 04:16:12 PM:
>> > (This appears to have bounced the first time.  Retrying...)
>> >
>> > I didn't see many options to gprof, so instead, I rebuilt subversion
>> > (and all of it's dependencies) with -pg to include profiling for
>> > all functions and re-ran the same test.  It took over 13 minutes
>> > to run this time, and looks like it includes more information.  I've
>> > attached the new gprof output.
>> >
>> > It may very well be spending a significant amount of time
>> > in ap_hash_next, since it is called over 2 billion times.
>> > apr_pstrdup is called 2.7 billion times and apr_palloc
>> > is called 2.1 billion times...
>> >
>> > This is only verifying 21 revisions, and the repo itself
>> > has over 13000...
>> >
>> > The svnadmin process didn't appear to use more than 44MB
>> > of RAM during the test, but it did use 100% of one CPU
>> > pretty much the whole time.
>>
>> Yet another datapoint, I dumped the repo, reorganized the
>> history with svndumptool to create no more than 100
>> directories per subdirectory and retested.  A full
>> "svnadmin verify" of this modified repo with the
>> same number of revisions and total files only takes
>> 6 minutes now instead of 40 hours.
>>
>> Something in "svnadmin verify" doesn't appear to like
>> large directories...
>
> I still believe in my theory, regardless of Bert's comments :)
>
> I will try to make a patch eventually if no one else gets around
> to it, and test my theory.

Note that filling a *large* hash table has O(N^2) time complexity,
which is why it gets so slow, I bet. Each time that hash table has to
grow, it needs to re-insert all the previously-inserted entries.

Unfortunately, there isn't even a way to pre-size a hash table when
you create it.

But: if you re-use a table, then it will be pre-sized. So yeah... you
may be able to fix the speed problem.
(tho I gotta believe the time-constant on that O(N^2) is pretty damned small...)

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1889210


Re: svnadmin verify performance issue (was: Re: How long do your svn dumps take)

Posted by Greg Stein <gs...@gmail.com>.
On Fri, Apr 24, 2009 at 17:33, Branko Čibej <br...@xbc.nu> wrote:
> Greg Stein wrote:
>> Note that filling a *large* hash table has O(N^2) time complexity,
>> which is why it gets so slow, I bet. Each time that hash table has to
>> grow, it needs to re-insert all the previously-inserted entries.
>>
>
> I thought it was O(N logN) if the hash table doubles in size every time
> it has to grow?

Could be. Not concerned anough to apply the brain power, so I'll just
take your word for it, or to say it is "somewhere in there".

Cheers,
-g

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1894337


Re: svnadmin verify performance issue (was: Re: How long do your svn dumps take)

Posted by Branko Cibej <br...@xbc.nu>.
Greg Stein wrote:
> Note that filling a *large* hash table has O(N^2) time complexity,
> which is why it gets so slow, I bet. Each time that hash table has to
> grow, it needs to re-insert all the previously-inserted entries.
>   

I thought it was O(N logN) if the hash table doubles in size every time
it has to grow?

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1893978