You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Marc-Antoine Ruel <ma...@gmail.com> on 2008/01/14 00:23:05 UTC

Feature request: pipelining checkout and update

Hi

Use case:
I'm synching a lot of small files, for many hundreds of megs worth.
I'm on Wifi + VPN + far far away = > 150 ms ping. I have rather slow
checkout even if the pipe is > 5 mbit/s.

Hypothesis:
The sync is limited by the latency I have from the server, not the
actual bandwidth capacity. If the client would sync many files
concurrently, in pipeline, it would inherently go much faster

Testing:
I tried a reduced case to see if the there is immediate room for
improvement. So I took a directory which only contained dir1 and dir2.
It was on XP on https protocol with svn 1.5.0, I don't know the exact
trunk version (sorry).

dir1 261 File(s)   41 Dir(s)    13334007 bytes
dir2 1728 File(s)    770 Dir(s)  30763364 bytes

As you can see, the directories aren't correctly balanced but that it
is still sufficient to show my point. I wanted to have different files
to be sure I wasn't affected by any kind of duplicate detection.

So my tests are:
Updating the directory that contains both subdirectories: 322 seconds
Updating both directories, one after the other: 323 seconds. (process
starting latency + initial https connection results in ~1 second
overhead) (101 for dir1 and 222 seconds for dir2)
Updating both sub directories at the same time: 89 seconds (dir1) and
216 seconds (perl).

I couldn't believe that is was faster when running two check out than
one at a time so I tried again the second and third test, in reverse
order.

Updating both subdirectories at the same time: 111 seconds (dir1) and
206 seconds (perl).
Updating both directories, one after the other: 341 seconds. (116 for
dir1 and 225 seconds for dir2)

Analysis:
So as you can see, the absolute error is very high (>30 seconds!), but
nevertheless, it's possible to see that running two checkout in
parallel is as fast as running one at a time, which means:
- Neither the server, the client or the bandwidth is the limiting factor.
- The limiting factor is something else: the latency to get each file.

Conclusion:
By pipelining the checkout, i.e. requesting many files at a time, svn
would reduce the effect of the latency.

Thanks

Marc-Antoine

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Feature request: pipelining checkout and update

Posted by David Glasser <gl...@davidglasser.net>.
Also, perhaps svnserve would perform better for you.

--dave

On Jan 13, 2008 7:52 PM, Ben Collins-Sussman <su...@red-bean.com> wrote:
> Um, have you tried ra_serf yet?  It's our new pipelining HTTP client for svn.
>
>
> On Jan 13, 2008 6:23 PM, Marc-Antoine Ruel <ma...@gmail.com> wrote:
> > Hi
> >
> > Use case:
> > I'm synching a lot of small files, for many hundreds of megs worth.
> > I'm on Wifi + VPN + far far away = > 150 ms ping. I have rather slow
> > checkout even if the pipe is > 5 mbit/s.
> >
> > Hypothesis:
> > The sync is limited by the latency I have from the server, not the
> > actual bandwidth capacity. If the client would sync many files
> > concurrently, in pipeline, it would inherently go much faster
> >
> > Testing:
> > I tried a reduced case to see if the there is immediate room for
> > improvement. So I took a directory which only contained dir1 and dir2.
> > It was on XP on https protocol with svn 1.5.0, I don't know the exact
> > trunk version (sorry).
> >
> > dir1 261 File(s)   41 Dir(s)    13334007 bytes
> > dir2 1728 File(s)    770 Dir(s)  30763364 bytes
> >
> > As you can see, the directories aren't correctly balanced but that it
> > is still sufficient to show my point. I wanted to have different files
> > to be sure I wasn't affected by any kind of duplicate detection.
> >
> > So my tests are:
> > Updating the directory that contains both subdirectories: 322 seconds
> > Updating both directories, one after the other: 323 seconds. (process
> > starting latency + initial https connection results in ~1 second
> > overhead) (101 for dir1 and 222 seconds for dir2)
> > Updating both sub directories at the same time: 89 seconds (dir1) and
> > 216 seconds (perl).
> >
> > I couldn't believe that is was faster when running two check out than
> > one at a time so I tried again the second and third test, in reverse
> > order.
> >
> > Updating both subdirectories at the same time: 111 seconds (dir1) and
> > 206 seconds (perl).
> > Updating both directories, one after the other: 341 seconds. (116 for
> > dir1 and 225 seconds for dir2)
> >
> > Analysis:
> > So as you can see, the absolute error is very high (>30 seconds!), but
> > nevertheless, it's possible to see that running two checkout in
> > parallel is as fast as running one at a time, which means:
> > - Neither the server, the client or the bandwidth is the limiting factor.
> > - The limiting factor is something else: the latency to get each file.
> >
> > Conclusion:
> > By pipelining the checkout, i.e. requesting many files at a time, svn
> > would reduce the effect of the latency.
> >
> > Thanks
> >
> > Marc-Antoine
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>



-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Feature request: pipelining checkout and update

Posted by Marc-Antoine Ruel <ma...@gmail.com>.
I didn't know about this. My bad. I looked at my svn build and it didn't
have serf included. I'll retest with that.

Thanks for the insight.

M-A

2008/1/13, Ben Collins-Sussman <su...@red-bean.com>:
>
> Um, have you tried ra_serf yet?  It's our new pipelining HTTP client for
> svn.
>
> On Jan 13, 2008 6:23 PM, Marc-Antoine Ruel <ma...@gmail.com> wrote:
> > Hi
> >
> > Use case:
> > I'm synching a lot of small files, for many hundreds of megs worth.
> > I'm on Wifi + VPN + far far away = > 150 ms ping. I have rather slow
> > checkout even if the pipe is > 5 mbit/s.
> >
> > Hypothesis:
> > The sync is limited by the latency I have from the server, not the
> > actual bandwidth capacity. If the client would sync many files
> > concurrently, in pipeline, it would inherently go much faster
> >
> > Testing:
> > I tried a reduced case to see if the there is immediate room for
> > improvement. So I took a directory which only contained dir1 and dir2.
> > It was on XP on https protocol with svn 1.5.0, I don't know the exact
> > trunk version (sorry).
> >
> > dir1 261 File(s)   41 Dir(s)    13334007 bytes
> > dir2 1728 File(s)    770 Dir(s)  30763364 bytes
> >
> > As you can see, the directories aren't correctly balanced but that it
> > is still sufficient to show my point. I wanted to have different files
> > to be sure I wasn't affected by any kind of duplicate detection.
> >
> > So my tests are:
> > Updating the directory that contains both subdirectories: 322 seconds
> > Updating both directories, one after the other: 323 seconds. (process
> > starting latency + initial https connection results in ~1 second
> > overhead) (101 for dir1 and 222 seconds for dir2)
> > Updating both sub directories at the same time: 89 seconds (dir1) and
> > 216 seconds (perl).
> >
> > I couldn't believe that is was faster when running two check out than
> > one at a time so I tried again the second and third test, in reverse
> > order.
> >
> > Updating both subdirectories at the same time: 111 seconds (dir1) and
> > 206 seconds (perl).
> > Updating both directories, one after the other: 341 seconds. (116 for
> > dir1 and 225 seconds for dir2)
> >
> > Analysis:
> > So as you can see, the absolute error is very high (>30 seconds!), but
> > nevertheless, it's possible to see that running two checkout in
> > parallel is as fast as running one at a time, which means:
> > - Neither the server, the client or the bandwidth is the limiting
> factor.
> > - The limiting factor is something else: the latency to get each file.
> >
> > Conclusion:
> > By pipelining the checkout, i.e. requesting many files at a time, svn
> > would reduce the effect of the latency.
> >
> > Thanks
> >
> > Marc-Antoine
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
> >
> >
>

Re: Feature request: pipelining checkout and update

Posted by Ben Collins-Sussman <su...@red-bean.com>.
Um, have you tried ra_serf yet?  It's our new pipelining HTTP client for svn.

On Jan 13, 2008 6:23 PM, Marc-Antoine Ruel <ma...@gmail.com> wrote:
> Hi
>
> Use case:
> I'm synching a lot of small files, for many hundreds of megs worth.
> I'm on Wifi + VPN + far far away = > 150 ms ping. I have rather slow
> checkout even if the pipe is > 5 mbit/s.
>
> Hypothesis:
> The sync is limited by the latency I have from the server, not the
> actual bandwidth capacity. If the client would sync many files
> concurrently, in pipeline, it would inherently go much faster
>
> Testing:
> I tried a reduced case to see if the there is immediate room for
> improvement. So I took a directory which only contained dir1 and dir2.
> It was on XP on https protocol with svn 1.5.0, I don't know the exact
> trunk version (sorry).
>
> dir1 261 File(s)   41 Dir(s)    13334007 bytes
> dir2 1728 File(s)    770 Dir(s)  30763364 bytes
>
> As you can see, the directories aren't correctly balanced but that it
> is still sufficient to show my point. I wanted to have different files
> to be sure I wasn't affected by any kind of duplicate detection.
>
> So my tests are:
> Updating the directory that contains both subdirectories: 322 seconds
> Updating both directories, one after the other: 323 seconds. (process
> starting latency + initial https connection results in ~1 second
> overhead) (101 for dir1 and 222 seconds for dir2)
> Updating both sub directories at the same time: 89 seconds (dir1) and
> 216 seconds (perl).
>
> I couldn't believe that is was faster when running two check out than
> one at a time so I tried again the second and third test, in reverse
> order.
>
> Updating both subdirectories at the same time: 111 seconds (dir1) and
> 206 seconds (perl).
> Updating both directories, one after the other: 341 seconds. (116 for
> dir1 and 225 seconds for dir2)
>
> Analysis:
> So as you can see, the absolute error is very high (>30 seconds!), but
> nevertheless, it's possible to see that running two checkout in
> parallel is as fast as running one at a time, which means:
> - Neither the server, the client or the bandwidth is the limiting factor.
> - The limiting factor is something else: the latency to get each file.
>
> Conclusion:
> By pipelining the checkout, i.e. requesting many files at a time, svn
> would reduce the effect of the latency.
>
> Thanks
>
> Marc-Antoine
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org