You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@subversion.apache.org by st...@apache.org on 2012/09/22 14:27:50 UTC

svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Author: stefan2
Date: Sat Sep 22 12:27:49 2012
New Revision: 1388786

URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
Log:
On the 10Gb branch.

* BRANCH-README: clarify goals and impact of this branch

Modified:
    subversion/branches/10Gb/BRANCH-README

Modified: subversion/branches/10Gb/BRANCH-README
URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
==============================================================================
--- subversion/branches/10Gb/BRANCH-README (original)
+++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
@@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
 10Gb/s for typical source code, i.e. becomes capable of
 saturating a 10Gb connection.
 
+Http:// will speep up by almost the same absolute value,
+1 second being saved per GB of data.  Due to slow processing
+in other places, this gain will be hard to measure, though.
+
 Bottlenecks to address:
 
-* frequent cancellation checks (intense OS interaction)
+* frequent cancellation / abortion checks on the server
+  side (intense OS interaction)
 * in-memory copies (membuffer cache -> empty deltification
   -> output buffer -> network stack)
 * various CPU-heavy tasks
 
 The patches have been written quite some time ago and I
-want them off my disk. OTOH, release 1.8 shall not be
-endangered.
+want them off my disk.  OTOH, release 1.8 shall not be
+endangered.  After testing and stabilization of this branch,
+parts of it may be merged into /trunk before 1.8.

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Tue, Sep 25, 2012 at 4:29 PM, Johan Corveleyn <jc...@gmail.com> wrote:

> On Sun, Sep 23, 2012 at 2:33 PM, Stefan Fuhrmann
> <st...@wandisco.com> wrote:
> > On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jc...@gmail.com>
> wrote:
> >>
> >> Heh, next question: what are those "slow places" mainly, and do you
> >> have any ideas to speed those up as well? Are there (even only
> >> theoretical) possibilities here? Or would that require major
> >> revamping? Or is it simply theoretically not possible to overcome
> >> certain bottlenecks?
> >
> > It is not entirely clear, yet, where that overhead comes from.
> > However,
> >
> > * IIRC, we use the same reporter on the same granularity,
> >   the server pushes a whole file tree out to the client with no
> >   need for extra roundtrips. But I may be mistaken here.
>
> With 1.8 there will only be ra_serf for http, and that does a separate
> http GET for every file during checkout/update. These requests can go
> in parallel. In most setups, with KeepAlive enabled, TCP connections
> will be reused, but still there will be a certain overhead for every
> http request/response. There is no giant streaming response with an
> entire tree.
>

Thanks for the clarification.


>  > Another thing is that svnserve would be just fine for many
> > use-cases if only it had decent SSPI / ldap support. But
> > that is something we simply need to code. Power users
> > inside a LAN may then use svnserve and more flexible /
> > complicated setups are handled by an Apache server on
> > the same repository.
>
> Ah yes. If somebody could "fix" the auth support in svnserve (in a way
> that really works, as opposed to the current SASL support), that would
> be great :-). That would open up a lot more options for deployment.
>

I guess I should talk to our IT guys when I see
them next month. A non-trivial Windows Domain
setup would help testing such code.


>  > Finally, 1.8 clients are much to slow to do anything useful
> > with that amount of bandwidth. Checksumming alone limits
> > the throughput to ~3Gb/s (for export since it only uses MD5)
> > or even ~1Gb/s (checkout calculates MD5 and SHA1).
> >
> > Future client will hopefully do much better here.
>
> Indeed. That would make the client again the clear bottleneck :-).
> Besides, even if you checksum at 3Gb/s, you'll need some seriously
> fast hardware to write to persistent storage at such a speed :-).
>

It's Gbits, not GBytes ;) And 300MB/s write speed is
not entirely outlandish considering todays SSDs.

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> On Sun, Sep 23, 2012 at 2:33 PM, Stefan Fuhrmann
>> Another thing is that svnserve would be just fine for many
>> use-cases if only it had decent SSPI / ldap support. But
>> that is something we simply need to code. Power users
>> inside a LAN may then use svnserve and more flexible /
>> complicated setups are handled by an Apache server on
>> the same repository.
>
> Ah yes. If somebody could "fix" the auth support in svnserve (in a way
> that really works, as opposed to the current SASL support), that would
> be great :-). That would open up a lot more options for deployment.

I don't know much about SASL/LDAP/SSPI but I thought svnserve could use
LDAP via SASL.  Are you able to describe what doesn't work but should
work?  If there is not an existing issue perhaps you should raise one.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Posted by Johan Corveleyn <jc...@gmail.com>.

On Sun, Sep 23, 2012 at 2:33 PM, Stefan Fuhrmann
<st...@wandisco.com> wrote:
> On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jc...@gmail.com> wrote:
>>
>> On Sat, Sep 22, 2012 at 2:27 PM,  <st...@apache.org> wrote:
>> > Author: stefan2
>> > Date: Sat Sep 22 12:27:49 2012
>> > New Revision: 1388786
>> >
>> > URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
>> > Log:
>> > On the 10Gb branch.
>> >
>> > * BRANCH-README: clarify goals and impact of this branch
>> >
>> > Modified:
>> >     subversion/branches/10Gb/BRANCH-README
>> >
>> > Modified: subversion/branches/10Gb/BRANCH-README
>> > URL:
>> > http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
>> >
>> > ==============================================================================
>> > --- subversion/branches/10Gb/BRANCH-README (original)
>> > +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
>> > @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
>> >  10Gb/s for typical source code, i.e. becomes capable of
>> >  saturating a 10Gb connection.
>> >
>> > +Http:// will speep up by almost the same absolute value,
>> > +1 second being saved per GB of data.  Due to slow processing
>> > +in other places, this gain will be hard to measure, though.
>>
>> Heh, next question: what are those "slow places" mainly, and do you
>> have any ideas to speed those up as well? Are there (even only
>> theoretical) possibilities here? Or would that require major
>> revamping? Or is it simply theoretically not possible to overcome
>> certain bottlenecks?
>
>
> It is not entirely clear, yet, where that overhead comes from.
> However,
>
> * the textual representation is not a problem - there is no
>   significant data overhead in HTTP. Base64 encoding has
>   been limiting in the past and may certainly be tuned much
>   more if need be.
> * IIRC, we use the same reporter on the same granularity,
>   the server pushes a whole file tree out to the client with no
>   need for extra roundtrips. But I may be mistaken here.

With 1.8 there will only be ra_serf for http, and that does a separate
http GET for every file during checkout/update. These requests can go
in parallel. In most setups, with KeepAlive enabled, TCP connections
will be reused, but still there will be a certain overhead for every
http request/response. There is no giant streaming response with an
entire tree.

> Possible sources for extra load:
>
> * Apache modules packing / unpacking / processing
>   the outgoing data (HTTP/XML tree?)
> * Apache access control modules - even if there is
>   blanket access
> * Fine-grained network communication.
>
> The latter two are a problem because we want to transmit
> 40k files + properties per second.
>
> My gut feeling is that we can address much of the issues
> that we will find and doubling the performance is virtually
> always possible. A stateless protocol like HTTP also
> makes it relatively easy to create parallel request streams
> to increase throughput.
>
> Another thing is that svnserve would be just fine for many
> use-cases if only it had decent SSPI / ldap support. But
> that is something we simply need to code. Power users
> inside a LAN may then use svnserve and more flexible /
> complicated setups are handled by an Apache server on
> the same repository.

Ah yes. If somebody could "fix" the auth support in svnserve (in a way
that really works, as opposed to the current SASL support), that would
be great :-). That would open up a lot more options for deployment.

> Finally, 1.8 clients are much to slow to do anything useful
> with that amount of bandwidth. Checksumming alone limits
> the throughput to ~3Gb/s (for export since it only uses MD5)
> or even ~1Gb/s (checkout calculates MD5 and SHA1).
>
> Future client will hopefully do much better here.

Indeed. That would make the client again the clear bottleneck :-).
Besides, even if you checksum at 3Gb/s, you'll need some seriously
fast hardware to write to persistent storage at such a speed :-).

-- 
Johan

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Sat, Sep 22, 2012 at 7:13 PM, Johan Corveleyn <jc...@gmail.com> wrote:

> On Sat, Sep 22, 2012 at 2:27 PM,  <st...@apache.org> wrote:
> > Author: stefan2
> > Date: Sat Sep 22 12:27:49 2012
> > New Revision: 1388786
> >
> > URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
> > Log:
> > On the 10Gb branch.
> >
> > * BRANCH-README: clarify goals and impact of this branch
> >
> > Modified:
> >     subversion/branches/10Gb/BRANCH-README
> >
> > Modified: subversion/branches/10Gb/BRANCH-README
> > URL:
> http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
> >
> ==============================================================================
> > --- subversion/branches/10Gb/BRANCH-README (original)
> > +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
> > @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
> >  10Gb/s for typical source code, i.e. becomes capable of
> >  saturating a 10Gb connection.
> >
> > +Http:// will speep up by almost the same absolute value,
> > +1 second being saved per GB of data.  Due to slow processing
> > +in other places, this gain will be hard to measure, though.
>
> Heh, next question: what are those "slow places" mainly, and do you
> have any ideas to speed those up as well? Are there (even only
> theoretical) possibilities here? Or would that require major
> revamping? Or is it simply theoretically not possible to overcome
> certain bottlenecks?
>

It is not entirely clear, yet, where that overhead comes from.
However,

* the textual representation is not a problem - there is no
  significant data overhead in HTTP. Base64 encoding has
  been limiting in the past and may certainly be tuned much
  more if need be.
* IIRC, we use the same reporter on the same granularity,
  the server pushes a whole file tree out to the client with no
  need for extra roundtrips. But I may be mistaken here.

Possible sources for extra load:

* Apache modules packing / unpacking / processing
  the outgoing data (HTTP/XML tree?)
* Apache access control modules - even if there is
  blanket access
* Fine-grained network communication.

The latter two are a problem because we want to transmit
40k files + properties per second.

My gut feeling is that we can address much of the issues
that we will find and doubling the performance is virtually
always possible. A stateless protocol like HTTP also
makes it relatively easy to create parallel request streams
to increase throughput.

Another thing is that svnserve would be just fine for many
use-cases if only it had decent SSPI / ldap support. But
that is something we simply need to code. Power users
inside a LAN may then use svnserve and more flexible /
complicated setups are handled by an Apache server on
the same repository.

Finally, 1.8 clients are much to slow to do anything useful
with that amount of bandwidth. Checksumming alone limits
the throughput to ~3Gb/s (for export since it only uses MD5)
or even ~1Gb/s (checkout calculates MD5 and SHA1).

Future client will hopefully do much better here.

-- Stefan^2.

-- 
*

Join us this October at Subversion Live
2012<http://www.wandisco.com/svn-live-2012>
 for two days of best practice SVN training, networking, live demos,
committer meet and greet, and more! Space is limited, so get signed up
today<http://www.wandisco.com/svn-live-2012>
!
*

Re: svn commit: r1388786 - /subversion/branches/10Gb/BRANCH-README

Posted by Johan Corveleyn <jc...@gmail.com>.

On Sat, Sep 22, 2012 at 2:27 PM,  <st...@apache.org> wrote:
> Author: stefan2
> Date: Sat Sep 22 12:27:49 2012
> New Revision: 1388786
>
> URL: http://svn.apache.org/viewvc?rev=1388786&view=rev
> Log:
> On the 10Gb branch.
>
> * BRANCH-README: clarify goals and impact of this branch
>
> Modified:
>     subversion/branches/10Gb/BRANCH-README
>
> Modified: subversion/branches/10Gb/BRANCH-README
> URL: http://svn.apache.org/viewvc/subversion/branches/10Gb/BRANCH-README?rev=1388786&r1=1388785&r2=1388786&view=diff
> ==============================================================================
> --- subversion/branches/10Gb/BRANCH-README (original)
> +++ subversion/branches/10Gb/BRANCH-README Sat Sep 22 12:27:49 2012
> @@ -3,13 +3,19 @@ svn:// single-threaded throughput from a
>  10Gb/s for typical source code, i.e. becomes capable of
>  saturating a 10Gb connection.
>
> +Http:// will speep up by almost the same absolute value,
> +1 second being saved per GB of data.  Due to slow processing
> +in other places, this gain will be hard to measure, though.

Heh, next question: what are those "slow places" mainly, and do you
have any ideas to speed those up as well? Are there (even only
theoretical) possibilities here? Or would that require major
revamping? Or is it simply theoretically not possible to overcome
certain bottlenecks?

-- 
Johan