You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Matt Doran <ma...@bigpond.com> on 2004/08/05 02:10:30 UTC

1.1rc1 performance regression in 'svn status'

In my testing of 1.1rc1 on windows (W2K SP4), I've noticed a big performance
regression on 'svn status'.  I'm not sure if this behaviour occurs in other
commands, but I've examined svn status in some detail.

I have a large repository that contains about 5000 files, with some
directories containing hundreds of files.  When I do an 'svn status' I get
the following results:

SVN 1.0.6:  ~ 4 secs
SVN 1.1rc1:  ~ 38 secs !!

I used the Filemon utility from sysinternals to try to determine what it
happening.  It looks like 1.1 excessively hits 'iconv' files. (e.g.
windows-1252.so, utf-8.so, etc) 

In 1.0 it appears to hit these files early in the process, but then doesn't
touch them while checking the *.svn-work and *.svn-base file.

In 1.1 however for *every* hit on *.svn-work, etc you get multiple hits to
various iconv file.  They always accessed in the following order:
* cp1252.so   (attempted open once, not found)
* windows-cp1252.so   (opened, closed, stat'd 3 times)
* _tbl_simple.so   (opened, closed, stat'd 3 times)
* cp1252.so   (attempted open once, not found)
* windows-cp1252.so   (opened, closed, stat'd 3 times)
* utf-8.so   (opened, closed, stat'd 3 times)

It seems that this is the thing killing performance.

I've attached a zip file with the repro script (that creates a 16 file
repository) and the output from filemon for the 'svn status' with both 1.0
and 1.1.  You can even see from the size of the 1.1 file that it does *a
lot* more work.

(Hopefully I haven't done anything stupid in my setup that make 1.1rc1
behave this way!  I made sure that the iconv path env variable was set to
the right location.   Let me know if you need more info to track this one
down. I'm happy to help).

Regards,
Matt Doran





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 1.1rc1 performance regression in 'svn status'

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Thu, 5 Aug 2004, [UTF-8] Branko Ä^Libej wrote:

> Peter N. Lundblad wrote:
>
> >When we use iconv, we cache the iconv handle in the current pool. I guess
> >we are doing some charset encoding in a loop with a subpool that gets
> >cleared or something.
> >
> >Couldn't we used thread-specific data for this instead?
> >
> No, because we have no way to initilize it, since we have neither
> library initializers nor a context parameter for each function.
>
Was thinking of apr_thread_once. Then I discovered apr_thread_once_init.
What's the point of thread_once if it needs to be initialized exactly
once? Is there nothing corresponding to PTHREAD_ONCE_INIT?

> The only problem is that you can't initialize it sanely if you're not
> controlling thread creation. See apr_thread_proc.h.
>
*sigh*

OK, the current caching is an improvement, but not enough since apr_iconv
doesn't do caching. Too bad.

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: 1.1rc1 performance regression in 'svn status'

Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:

>On Wed, 4 Aug 2004, Matt Doran wrote:
>
>  
>
>>I have a large repository that contains about 5000 files, with some
>>directories containing hundreds of files.  When I do an 'svn status' I get
>>the following results:
>>
>>SVN 1.0.6:  ~ 4 secs
>>SVN 1.1rc1:  ~ 38 secs !!
>>
>>I used the Filemon utility from sysinternals to try to determine what it
>>happening.  It looks like 1.1 excessively hits 'iconv' files. (e.g.
>>windows-1252.so, utf-8.so, etc)
>>
>>In 1.0 it appears to hit these files early in the process, but then doesn't
>>    
>>
>...
>
>When we use iconv, we cache the iconv handle in the current pool. I guess
>we are doing some charset encoding in a loop with a subpool that gets
>cleared or something.
>
>Couldn't we used thread-specific data for this instead?
>
No, because we have no way to initilize it, since we have neither 
library initializers nor a context parameter for each function.

> Another idea would
>be to cache the handle in the root of the pool hierarchy. I don't know if
>this might cross thread boundaries, so that might not be a good idea.
>  
>
Yes, it might, and we threw it out for exactly that reason -- we'd done 
so in the past.

>FWIW, I don't get this problem on Linux, but glibc's iconv might be better
>at caching.
>  
>
The problem with apr-iconv is that it loads actual converters as shared 
libraries, and loads them every time you create a conversion context.

>Is there any proboem with TSD in APR? Don't know ow it works on Windows.
>  
>
The only problem is that you can't initialize it sanely if you're not 
controlling thread creation. See apr_thread_proc.h.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 1.1rc1 performance regression in 'svn status'

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Wed, 4 Aug 2004, Matt Doran wrote:

> I have a large repository that contains about 5000 files, with some
> directories containing hundreds of files.  When I do an 'svn status' I get
> the following results:
>
> SVN 1.0.6:  ~ 4 secs
> SVN 1.1rc1:  ~ 38 secs !!
>
> I used the Filemon utility from sysinternals to try to determine what it
> happening.  It looks like 1.1 excessively hits 'iconv' files. (e.g.
> windows-1252.so, utf-8.so, etc)
>
> In 1.0 it appears to hit these files early in the process, but then doesn't
...

When we use iconv, we cache the iconv handle in the current pool. I guess
we are doing some charset encoding in a loop with a subpool that gets
cleared or something.

Couldn't we used thread-specific data for this instead? Another idea would
be to cache the handle in the root of the pool hierarchy. I don't know if
this might cross thread boundaries, so that might not be a good idea.

FWIW, I don't get this problem on Linux, but glibc's iconv might be better
at caching.

Is there any proboem with TSD in APR? Don't know ow it works on Windows.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 1.1rc1 performance regression in 'svn status'

Posted by Branko Čibej <br...@xbc.nu>.
Peter N. Lundblad wrote:

>On Wed, 4 Aug 2004, Matt Doran wrote:
>
>  
>
>>SVN 1.0.6:  ~ 4 secs
>>SVN 1.1rc1:  ~ 38 secs !!
>>
>>I used the Filemon utility from sysinternals to try to determine what it
>>happening.  It looks like 1.1 excessively hits 'iconv' files. (e.g.
>>windows-1252.so, utf-8.so, etc)
>>
>>    
>>
>I did some more checking in GDB. It appears that apr_xlate_open gets
>called 4312 times out of 17204 calls to get_xlate_handle (in
>libsvn_subr/utf.c).
>This was during a run of "svn st" on trunk. This caching strategy needs to
>be improved.
>  
>
That's a 75% hit rate, which -- given our constraints -- isn't all that 
bad. The problem is that we can only safely cache the xlate handle in 
the current pool. If that happens to be a subpool in some loop -- well, 
tough luck.

>I don't know why it is faster on 1.0, though.
>  
>
We probably fixed some pool usage bugs in the meantime, and those fixes 
happened to narrow the scopes of some xlate handles.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: 1.1rc1 performance regression in 'svn status'

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Wed, 4 Aug 2004, Matt Doran wrote:

> SVN 1.0.6:  ~ 4 secs
> SVN 1.1rc1:  ~ 38 secs !!
>
> I used the Filemon utility from sysinternals to try to determine what it
> happening.  It looks like 1.1 excessively hits 'iconv' files. (e.g.
> windows-1252.so, utf-8.so, etc)
>
I did some more checking in GDB. It appears that apr_xlate_open gets
called 4312 times out of 17204 calls to get_xlate_handle (in
libsvn_subr/utf.c).
This was during a run of "svn st" on trunk. This caching strategy needs to
be improved.
I don't know why it is faster on 1.0, though.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org