You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Jonathan Holloway <jo...@gmail.com> on 2012/09/04 16:26:36 UTC

Parallelising SVN Checkout

Hi all,

Has anybody had any success in parallelising Subversion checkouts in the
past on a subfolder level
to improve performance at all?

By this I mean using svn sparse directories with --depth intermediates to
produce a skeleton structure,
such as:

project
* subfoldera - handled by a checkout thread
* subfolderb - handled by a checkout thread

http://svnbook.red-bean.com/en/1.5/svn.advanced.sparsedirs.html

then forking a process/starting a thread to svn update the subfolders?
 Does this make sense from a
performance point of view or is the bottleneck of disk I/O always
hit pretty early on by doing this?

http://stackoverflow.com/questions/4160070/can-an-svn-checkout-be-multi-threaded

I've done this with a Python script and the multiprocessing module (and
daemon processes) so far.  I
just wanted to check to see if there was an existing solution to this.

Many thanks,
Jon.

Re: Parallelising SVN Checkout

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Sep 4, 2012 at 10:26 AM, Jonathan Holloway <
jonathan.holloway@gmail.com> wrote:

>
> Has anybody had any success in parallelising Subversion checkouts in the
> past on a subfolder level
> to improve performance at all?
>
> By this I mean using svn sparse directories with --depth intermediates to
> produce a skeleton structure,
> such as:
>
> project
> * subfoldera - handled by a checkout thread
> * subfolderb - handled by a checkout thread
>
> http://svnbook.red-bean.com/en/1.5/svn.advanced.sparsedirs.html
>
> then forking a process/starting a thread to svn update the subfolders?
>  Does this make sense from a
> performance point of view or is the bottleneck of disk I/O always
> hit pretty early on by doing this?
>
>
> http://stackoverflow.com/questions/4160070/can-an-svn-checkout-be-multi-threaded
>
> I've done this with a Python script and the multiprocessing module (and
> daemon processes) so far.  I
> just wanted to check to see if there was an existing solution to this.
>

You will not be able to do this with SVN 1.7+ as there is a single working
copy admin area that will be locked by the first process that obtains the
lock.  If you are using HTTP(S), you can switch to using the ra_serf
library which opens multiple connections to the server and fetches multiple
files at once.  In 1.8, this will be the new HTTP client library.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: Parallelising SVN Checkout

Posted by Jonathan Holloway <jo...@gmail.com>.

Hi Stefan,

Well it's early on in the process, the script works, but I've not tested it
on a local repository yet with
a suitably sized project.  The remote Tomcat Apache project (approx 64MB)
that I did "checked out"
slightly slower with the script.  I haven't delved into i/o stats or
network latency there yet.  It was a simple
proof of concept.  I just wanted to check with everyone here as to whether
it made sense to continue
with the investigation,

Judging by what Mark says above though, it sounds like this doesn't make
sense with 1.7.  Thanks for the
info on the ra_serf library though.

Many thanks,
Jon.

On 4 September 2012 15:37, Stefan Sperling <st...@elego.de> wrote:

> On Tue, Sep 04, 2012 at 03:26:36PM +0100, Jonathan Holloway wrote:
> > then forking a process/starting a thread to svn update the subfolders?
> >  Does this make sense from a
> > performance point of view or is the bottleneck of disk I/O always
> > hit pretty early on by doing this?
>
> Did you make any performance measurements?
>
> Checkout is mostly I/O bound, so I doubt doing this has much benefit
> unless you have an I/O system that is highly parallelized.
>

Re: Parallelising SVN Checkout

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Sep 04, 2012 at 03:26:36PM +0100, Jonathan Holloway wrote:
> then forking a process/starting a thread to svn update the subfolders?
>  Does this make sense from a
> performance point of view or is the bottleneck of disk I/O always
> hit pretty early on by doing this?

Did you make any performance measurements?

Checkout is mostly I/O bound, so I doubt doing this has much benefit
unless you have an I/O system that is highly parallelized.