You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Erik Huelsmann <e....@gmx.net> on 2004/09/05 15:45:35 UTC

issue 2043 profiling data

Ok. I ran tests to see what happens with issue 2043 (committing 300.000+
files).

When committing, this process happens:
- the client collects (harvests) the targets it needs to commit;
- an RA session is opened to do the actual commit.

With gprof I found that the case described in the mail referenced by the
issue, it takes hours to prepare for the commit, ie the client takes hours
to complete the harvesting. Most of that time is spent in
look_up_committable as seen from the diagram below:

Each sample counts as 0.000999001 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 93.32    328.51   328.51   324343     0.00     0.00  look_up_committable
  0.97    331.92     3.41  2902088     0.00     0.00  svn_utf__is_valid


The client has been running with 96%+ CPU for 4 hrs while collecting
committables when an error occurs (because it cannot connect) and the client
exits.

There is one problem though: The program ran for about 4 hours, but the
gprof table seems to suggest that only 328.51 seconds elapsed in
look_up_committable (which is 93% of all time elapsed).

Can someone help me explain the data above and the relation with the 'time'
output which gives 229minutes?


Thanks!

bye,

Erik.

-- 
Superg�nstige DSL-Tarife + WLAN-Router f�r 0,- EUR*
Jetzt zu GMX wechseln und sparen http://www.gmx.net/de/go/dsl


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: issue 2043 profiling data

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Monday, September 6, 2004 7:36 AM +0200 Erik Huelsmann 
<e....@gmx.net> wrote:

> So real and user are not that far apart. Do you mean that real is wall-clock
> and user is processor time? Tobias was trying to convince me the problem was
> with swapping and waiting for the I/O, but if you are correct, then swapping
> will not be the cause of this long execution time.

*If* gprof isn't giving bogus data (what does oprofile say if you have access 
to it?) and look_up_committables is really the big bottleneck, I'd say that 
the iterative strcmp case at a roughly O(300,000^2) is the problem.  300,000 
entries in the array, we do a *linear* search on the array, and then call 
look_up_comittables on each of the 300k files in the directory, exit at first 
match.  I'd think a better data structure there should remove that bottleneck. 
And, this would explain why the time isn't sys time - it's CPU time doing 
strcmp's.  My hunch from a 30 sec glance at the code.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: issue 2043 profiling data

Posted by Erik Huelsmann <e....@gmx.net>.
> >There is one problem though: The program ran for about 4 hours, but the
> >gprof table seems to suggest that only 328.51 seconds elapsed in
> >look_up_committable (which is 93% of all time elapsed).
> >
> >Can someone help me explain the data above and the relation with the
> 'time'
> >output which gives 229minutes?
> >  
> >
> I think gprof will show CPU time, while most of the wall clock time is 
> blocking in I/O. 'time' should actually tell you user, system and wall 
> clock time; which of them is 229 minutes?

Re: issue 2043 profiling data

Posted by Branko Čibej <br...@xbc.nu>.
Erik Huelsmann wrote:

>Ok. I ran tests to see what happens with issue 2043 (committing 300.000+
>files).
>
>When committing, this process happens:
>- the client collects (harvests) the targets it needs to commit;
>- an RA session is opened to do the actual commit.
>
>With gprof I found that the case described in the mail referenced by the
>issue, it takes hours to prepare for the commit, ie the client takes hours
>to complete the harvesting. Most of that time is spent in
>look_up_committable as seen from the diagram below:
>
>Each sample counts as 0.000999001 seconds.
>  %   cumulative   self              self     total
> time   seconds   seconds    calls   s/call   s/call  name
> 93.32    328.51   328.51   324343     0.00     0.00  look_up_committable
>  0.97    331.92     3.41  2902088     0.00     0.00  svn_utf__is_valid
>
>
>The client has been running with 96%+ CPU for 4 hrs while collecting
>committables when an error occurs (because it cannot connect) and the client
>exits.
>
>There is one problem though: The program ran for about 4 hours, but the
>gprof table seems to suggest that only 328.51 seconds elapsed in
>look_up_committable (which is 93% of all time elapsed).
>
>Can someone help me explain the data above and the relation with the 'time'
>output which gives 229minutes?
>  
>
I think gprof will show CPU time, while most of the wall clock time is 
blocking in I/O. 'time' should actually tell you user, system and wall 
clock time; which of them is 229 minutes?

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org