You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jo...@uz.kuleuven.ac.be> on 2009/06/25 23:42:50 UTC

svn log on FSFS - disk IO for revs/revprops

Hi all,

Disclaimer: if this isn't the right forum to discuss this, please let me know. But I don't think this is a "users" kind of question, so I'm addressing the dev list. Also: I don't know much about svn internals, so forgive me for making any false assumptions or stupid suggestions ...

I have looked at some of the disk IO that the svn server performs when it's asked for an "svn log" (I wanted to see what we could do on OS/disk/NFS level to improve it). For this, we used rwsnoop (http://www.manpagez.com/man/1/rwsnoop/), while running "svn log" for a file with lots of changes. 

Server: 
1.5.4 on Solaris 10 (binaries from sunfreeware.com)
FSFS backend on NFS mounted volume

Exact client command: "svn log URL/file_with_lots_of_changes"
The file has had 5500 revisions. It takes ~4 minutes to retrieve its entire log.

Here is an excerpt of rwsnoop output:
------
UID      PID CMD          D   BYTES FILE
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R     276 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R     276 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R     412 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R     572 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    3699 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    3871 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R     131 /path/to/repos/db/revprops/58/58301
24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R     411 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R     570 /path/to/repos/db/revs/58/58258
24435   9896 httpd        R    3694 /path/to/repos/db/revs/58/58258 
...
24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R     118 /path/to/repos/db/revprops/58/58258
24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R     411 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R     570 /path/to/repos/db/revs/58/58255
24435   9896 httpd        R    3694 /path/to/repos/db/revs/58/58255
...
------

I have some questions about this:
- Why are the rev files read (multiple times over -> see below), although the log info is contained in the reprops files (which are also read)? These reads must be very costly. Is it because that's the only way for the server to reconstruct the line of history (find out which revision to look up next)? Note that we didn't run "svn log -v", so no need to look up related paths.

- I guess first some headers are read, because the first couple of reads seem small (apart from the very first one) and more or less fixed in size. Right? Maybe this can be optimized?

- It seems the read buffer is limited to 4K. Is there a way to increase this buffer (apart from patching source code and building my own custom server)? Is this somehow configurable? Note: NFS mount options were set to rsize=8192 - other commands (e.g. grep) read blocks of 8192 bytes.

- And then the biggest mystery for me: much more bytes are read from the rev files than necessary. I guess the rev files must be read multiple times (>= 5 times)??
Total bytes read for rev 58301, according to rwsnoop: 62418
-----
$ ls -l /path/to/repos/db/revs/58/58301
-rw-rw----   1 owner   grp        12137 May  9 08:30 58301
-----

Thanks for any insight anyone can offer.

Some context: I'm very much interested in improving performance of "svn log" (and also blame, but that's for another post). IMHO, the context/history of changes (especially with a large and long-lived codebase) is extremely important, and should be as accessible as possible. In my view, that's one of the most important responsibilities of a VCS. All in the spirit of "developers spend much more of their time reading (and understanding) code, than writing it".

Regards,
Johan

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365532


Re: svn log on FSFS - disk IO for revs/revprops

Posted by Greg Hudson <gh...@mit.edu>.
On Fri, 2009-06-26 at 01:42 +0200, Johan Corveleyn wrote:
> - Why are the rev files read (multiple times over -> see below), although the log info is contained in the reprops files (which are also read)? These reads must be very costly. Is it because that's the only way for the server to reconstruct the line of history (find out which revision to look up next)? Note that we didn't run "svn log -v", so no need to look up related paths.

Because you are running log on a particular path, the rev files must be
consulted to figure out which revisions should be present in the log
output.

FSFS is pretty liberal about opening and seeking around in rev files.
It was designed for deployability and rough scalability, not so much for
raw performance.  (I'll put that last point another way: FSFS shouldn't
slow down too much as repositories get bigger in various ways, with some
exceptions, but it's easy to conceive of other designs which would be,
say, 50% faster at any given size.)

That said, sometimes in the past we've found gross inefficiencies in the
higher-level algorithms for repository operations like "log".  It's
possible that analysis would find one here.  Unfortunately, most of the
tools people have available to do automated analysis (like rwsnoop)
focus on the low-level behavior; to find higher-level algorithmic
inefficiences there is no substitute for understanding the code
involved.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365730

Re: svn log on FSFS - disk IO for revs/revprops

Posted by Eric Gillespie <ep...@pretzelnet.org>.
At Google, we discovered the same thing:
  http://svn.haxx.se/dev/archive-2007-08/0239.shtml

Hyrum K. Wright <hy...@hyrumwright.org> writes:

> Have you tried this with 1.6.x?  1.6.0 introduced some additional  
> caching of information read from revision files, which should decrease  
> the number of i/o operations.

If you're referring to Glasser's work, we found that it didn't
really help.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365909

Re: svn log on FSFS - disk IO for revs/revprops

Posted by Eric Gillespie <ep...@wundagore.diplodocus.org>.
At Google, we discovered the same thing:
  http://svn.haxx.se/dev/archive-2007-08/0239.shtml

Hyrum K. Wright <hy...@hyrumwright.org> writes:

> Have you tried this with 1.6.x?  1.6.0 introduced some additional  
> caching of information read from revision files, which should decrease  
> the number of i/o operations.

If you're referring to Glasser's work, we found that it didn't
really help.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365870

Re: svn log on FSFS - disk IO for revs/revprops

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.
On Jun 25, 2009, at 6:42 PM, Johan Corveleyn wrote:

> Hi all,
>
> Disclaimer: if this isn't the right forum to discuss this, please  
> let me know. But I don't think this is a "users" kind of question,  
> so I'm addressing the dev list. Also: I don't know much about svn  
> internals, so forgive me for making any false assumptions or stupid  
> suggestions ...
>
> I have looked at some of the disk IO that the svn server performs  
> when it's asked for an "svn log" (I wanted to see what we could do  
> on OS/disk/NFS level to improve it). For this, we used rwsnoop (http://www.manpagez.com/man/1/rwsnoop/ 
> ), while running "svn log" for a file with lots of changes.
>
> Server:
> 1.5.4 on Solaris 10 (binaries from sunfreeware.com)
> FSFS backend on NFS mounted volume
>
> Exact client command: "svn log URL/file_with_lots_of_changes"
> The file has had 5500 revisions. It takes ~4 minutes to retrieve its  
> entire log.
>
> Here is an excerpt of rwsnoop output:
> ------
> UID      PID CMD          D   BYTES FILE
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R     276 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R     276 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R     412 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R     572 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    3699 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    3871 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58301
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R     131 /path/to/repos/db/revprops/ 
> 58/58301
> 24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R     411 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R     570 /path/to/repos/db/revs/58/58258
> 24435   9896 httpd        R    3694 /path/to/repos/db/revs/58/58258
> ...
> 24435   9896 httpd        R    4096 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R     118 /path/to/repos/db/revprops/ 
> 58/58258
> 24435   9896 httpd        R      64 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R     275 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R     411 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R     570 /path/to/repos/db/revs/58/58255
> 24435   9896 httpd        R    3694 /path/to/repos/db/revs/58/58255
> ...
> ------
>
> I have some questions about this:
> - Why are the rev files read (multiple times over -> see below),  
> although the log info is contained in the reprops files (which are  
> also read)? These reads must be very costly. Is it because that's  
> the only way for the server to reconstruct the line of history (find  
> out which revision to look up next)? Note that we didn't run "svn  
> log -v", so no need to look up related paths.
>
> - I guess first some headers are read, because the first couple of  
> reads seem small (apart from the very first one) and more or less  
> fixed in size. Right? Maybe this can be optimized?
>
> - It seems the read buffer is limited to 4K. Is there a way to  
> increase this buffer (apart from patching source code and building  
> my own custom server)? Is this somehow configurable? Note: NFS mount  
> options were set to rsize=8192 - other commands (e.g. grep) read  
> blocks of 8192 bytes.
>
> - And then the biggest mystery for me: much more bytes are read from  
> the rev files than necessary. I guess the rev files must be read  
> multiple times (>= 5 times)??
> Total bytes read for rev 58301, according to rwsnoop: 62418
> -----
> $ ls -l /path/to/repos/db/revs/58/58301
> -rw-rw----   1 owner   grp        12137 May  9 08:30 58301
> -----
>
> Thanks for any insight anyone can offer.
>
> Some context: I'm very much interested in improving performance of  
> "svn log" (and also blame, but that's for another post). IMHO, the  
> context/history of changes (especially with a large and long-lived  
> codebase) is extremely important, and should be as accessible as  
> possible. In my view, that's one of the most important  
> responsibilities of a VCS. All in the spirit of "developers spend  
> much more of their time reading (and understanding) code, than  
> writing it".

Have you tried this with 1.6.x?  1.6.0 introduced some additional  
caching of information read from revision files, which should decrease  
the number of i/o operations.

-Hyrum

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365737

RE: svn log on FSFS - disk IO for revs/revprops

Posted by Johan Corveleyn <jo...@uz.kuleuven.ac.be>.
> -----Oorspronkelijk bericht-----
> Van: David James [mailto:james82@gmail.com]
> Verzonden: vrijdag 26 juni 2009 23:39
> 
> Have you tried setting SVNPathAuthz Off or SVNPathAuthz
> short_circuit
> in your HTTP config? I have seen cases where the the cause of the
> delay was actually due to the overhead associated with performing
> HTTP
> requests to authenticate the paths.

Thanks for the tip, but SVNPathAuthz is Off in this setup.

> 
> You might also want to check out the performance using svn:// or
> file://, because if this is faster it would show you that disk
> activity is probably not the issue. See
> http://svn.haxx.se/users/archive-2005-03/1247.shtml, which shows a
> dramatic performance difference for svn log between http:// and
> svn://
> access.

We also tried with file:// and svn://, but that didn't make much difference (maybe half a minute, so down from 4' to 3'30").

I think big performance differences in the past between svn:// (of file://) and http:// may have been due to SVNPathAuthz. Or maybe quality/speed of the network connection (which is not an issue in our case, it's all on a LAN).

No, I'm really sure that it's purely server-side IO bound.
Also, when running the exact same log command a second time, it's much faster (down to 1'30). That's probably because the server has got the necessary files in disk cache. However, that's no real help, because in a real world scenario people ask logs of different files all the time...

Regards,
Johan

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2366072


Re: svn log on FSFS - disk IO for revs/revprops

Posted by David James <ja...@gmail.com>.
On Thu, Jun 25, 2009 at 4:42 PM, Johan
Corveleyn<jo...@uz.kuleuven.ac.be> wrote:
> I have looked at some of the disk IO that the svn server performs when it's asked for an "svn log" (I wanted to see what we could do on OS/disk/NFS level to improve it). For this, we used rwsnoop (http://www.manpagez.com/man/1/rwsnoop/), while running "svn log" for a file with lots of changes.
...
> Exact client command: "svn log URL/file_with_lots_of_changes"
> The file has had 5500 revisions. It takes ~4 minutes to retrieve its entire log.
...
> Thanks for any insight anyone can offer.

Hi Johan,

Have you tried setting SVNPathAuthz Off or SVNPathAuthz short_circuit
in your HTTP config? I have seen cases where the the cause of the
delay was actually due to the overhead associated with performing HTTP
requests to authenticate the paths.

You might also want to check out the performance using svn:// or
file://, because if this is faster it would show you that disk
activity is probably not the issue. See
http://svn.haxx.se/users/archive-2005-03/1247.shtml, which shows a
dramatic performance difference for svn log between http:// and svn://
access.

Cheers,

David

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2365852