You are viewing a plain text version of this content. The canonical link for it is here.

Posted to replication@couchdb.apache.org by Jens Alfke <je...@couchbase.com> on 2014/01/24 16:20:55 UTC

_bulk_get protocol extension

(I'm excited about this list! There have been some topics I've wanted to bring up that are too implementation-oriented for the user@ list, but I haven't been brave enough to dive into the dev@ list because I don't know Erlang or the internals of CouchDB. I also really appreciate folks sharing the viewpoint that CouchDB is an ecosystem and an open replication protocol, not just a particular database implementation.)

Anyway. One topic I'd like to bring up is that, in my non-scientific observations, the major performance bottleneck in pull replications is the fact that revisions have to be transferred using individual GET requests. I've seen very poor performance when pulling lots of small documents from a distant server, like an order of magnitude below the throughput of sending a single huge document.

(Yes, it's possible to get multiple revisions at once by POSTing to _all_docs. Unfortunately this has limitations that make it unsuitable for replication; see my explanation at the page linked below.)

A few months ago I experimentally implemented a new "_bulk_get" REST call in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which significantly improves performance by allowing the puller to request any number of revisions in a single HTTP request. Again, no scientific tests or hard numbers, but it was enough to convince me it's worthwhile. I've documented it here:
https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
It's pretty straightforward and I've tried to make it consistent with the standard API. The only unusual thing is that the response can contain nested MIME multipart bodies: the response format is multipart, with every requested revision in a part, but revisions containing attachments are themselves sent as multipart. (This shouldn't be an issue for any decent multipart parser, since nested multipart is pretty common in emails, but I think it's the first time it's happened in the CouchDB API.)

I'd be happy if this were implemented in CouchDB and made an official part of the API. Hopefully the spec I wrote is detailed enough to make that straightforward. (I don't have the Erlang skills to do it myself, though.)

—Jens

Re: _bulk_get protocol extension

Posted by Adam Kocoloski <ko...@apache.org>.

The replicator knows how to pipeline, yes, but the server considers each pipelined request separately, whereas theoretically with a _bulk_get it could batch the btree operations as well (and eliminate some redundant inner node lookups).  I'm also not sure how efficient the ibrowse client's implementation of pipelining is in the end -- I've not observed the kinds of speedups that I'd expect with it.

Jens, I believe Apache CouchDB already does handle nested multipart on the PUT side (multiple revisions of a document with attachments), but I'll admit the code to do so is rather difficult to grok and could really benefit from a refactor, which would certainly enable a nice implementation of something like _bulk_get.

Adam

On Jan 24, 2014, at 12:35 PM, Yaron Goland <ya...@microsoft.com> wrote:

> In the HTTP WG more than a decade ago issues like this came up under the name 'boxcar'ing'. But with the introduction of pipelining the performance benefits of boxcar'ing for idempotent requests went away. 
> 
> In a replication the source should be able to fire off GET requests down the pipeline non-stop and the remote server should be able to return them just as quickly. So have you identified why you are seeing bad performance?
> 
> 	Thanks,
> 
> 			Yaron
> 
>> -----Original Message-----
>> From: Jens Alfke [mailto:jens@couchbase.com]
>> Sent: Friday, January 24, 2014 7:21 AM
>> To: replication@couchdb.apache.org
>> Subject: _bulk_get protocol extension
>> 
>> (I'm excited about this list! There have been some topics I've wanted to bring
>> up that are too implementation-oriented for the user@ list, but I haven't
>> been brave enough to dive into the dev@ list because I don't know Erlang or
>> the internals of CouchDB. I also really appreciate folks sharing the viewpoint
>> that CouchDB is an ecosystem and an open replication protocol, not just a
>> particular database implementation.)
>> 
>> Anyway. One topic I'd like to bring up is that, in my non-scientific
>> observations, the major performance bottleneck in pull replications is the
>> fact that revisions have to be transferred using individual GET requests. I've
>> seen very poor performance when pulling lots of small documents from a
>> distant server, like an order of magnitude below the throughput of sending a
>> single huge document.
>> 
>> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs.
>> Unfortunately this has limitations that make it unsuitable for replication; see
>> my explanation at the page linked below.)
>> 
>> A few months ago I experimentally implemented a new "_bulk_get" REST call
>> in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which
>> significantly improves performance by allowing the puller to request any
>> number of revisions in a single HTTP request. Again, no scientific tests or hard
>> numbers, but it was enough to convince me it's worthwhile. I've
>> documented it here:
>> 	https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
>> It's pretty straightforward and I've tried to make it consistent with the
>> standard API. The only unusual thing is that the response can contain nested
>> MIME multipart bodies: the response format is multipart, with every
>> requested revision in a part, but revisions containing attachments are
>> themselves sent as multipart. (This shouldn't be an issue for any decent
>> multipart parser, since nested multipart is pretty common in emails, but I
>> think it's the first time it's happened in the CouchDB API.)
>> 
>> I'd be happy if this were implemented in CouchDB and made an official part
>> of the API. Hopefully the spec I wrote is detailed enough to make that
>> straightforward. (I don't have the Erlang skills to do it myself, though.)
>> 
>> -Jens

Re: _bulk_get protocol extension

Posted by Jens Alfke <je...@couchbase.com>.

On Jan 28, 2014, at 8:52 AM, Yaron Goland <ya...@microsoft.com> wrote:

> I did read it and I didn't agree with it.

Ilya Grigorik works on performance on the Chrome team at Google, so I'm inclined to trust him on statements about practical aspects of HTTP. (I worked on Chrome for a year+ but not on HTTP-level stuff.)

>> 	* A single slow response blocks all requests behind it.
> The same is true of bulk get.

No, because a bulk_get response doesn't have to return the documents in the same order they're requested. It can fetch them all in parallel if it wants, and send them out in the order they're ready.

I really don't want to get into an argument about pipelining.  I can point to the entry on browser compatibility in the Wikipedia article, which shows many browsers not supporting it due to server-side issues like head-of-line blocking and buggy gateways:
	http://en.wikipedia.org/wiki/HTTP_pipelining#Implementation_in_web_browsers

> Your first argument is that the overhead of GET is so bad that even in the face of pipelining the performance will still be significantly worse than a bulk request. Well you said you already implemented bulk requests. So um... why not publish some numbers and the code you used to generate it?

I implemented _bulk_get in the Couchbase Sync Gateway, not in CouchDB (I don't work on CouchDB). I doubt the code would be of interest to people here. :)

Before I take the time to set up and run tests and publish numbers, I'd like to know whether that effort would make a difference to people considering whether to implement this API call.

—Jens

RE: _bulk_get protocol extension

Posted by Yaron Goland <ya...@microsoft.com>.

I did read it and I didn't agree with it.

> 	* A single slow response blocks all requests behind it.

The same is true of bulk get. Remember the only thing that can be pipelined are non-idempotent methods which generally means GET. So if a single GET can slow down the whole pipeline so a single 'virtual' GET in a bulk GET request can slow down the response as well.

> 	* When processing in parallel, servers must buffer pipelined
> responses, which may exhaust server resources-e.g., what if one of the
> responses is very large? This exposes an attack vector against the server!

This is the whole point of buffer control in TCP. The server only pulls off what it can handle. If a client sends more requests than the server can handle then the server stops servicing the buffer and buffer control automatically pushes back on the client. 

Put another way if this attack works then a client can replicate it without pipelining just by making multiple independent requests. So either a server protects itself from DOS by clients or it doesn't, pipelining doesn't change anything.

> 	* A failed response may terminate the TCP connection, forcing the
> client to re-request all the subsequent resources, which may cause duplicate
> processing.

Certainly nothing in HTTP requires such termination so what this point is really says 'bad clients will throw exceptions on non-200 responses'. Well bad clients are going to do a lot of bad silly stupid things. If they use decent libraries (e.g. Apache, .net, etc.) this isn't a problem because the exception won't terminate the connection. The connection is actually part of a pool and is managed separately.

So yes, bad clients will do bad things but that applies no matter what so I don't see it worth worrying about.

> 	* Detecting pipelining compatibility reliably, where intermediaries
> may be present, is a nontrivial problem.

Pipelining is point to point, not end to end. In other words if the intermediary is returning 1.1 responses then it is a 1.1 intermediary otherwise its job is to return 1.0 even if the upstream system it's talking to is 1.1. So pipelining happens. So each hop only needs to probe its next hop.

> 	* Some intermediaries do not support pipelining and may abort the
> connection, while others may serialize all requests.

Intermediaries that don't support pipelining publish 1.0 for just that reason. And serialization is always a possibility but the server can do the same serialization. So yes, bad infrastructure is bad infrastructure. But that isn't a reason to abandon the protocol and invent a new protocol to crawl through the old one.

So personally I'm having trouble buying the protocol argument. But you make two arguments in your email that seem well positioned to have a really productive conversation about.

Your first argument is that the overhead of GET is so bad that even in the face of pipelining the performance will still be significantly worse than a bulk request. Well you said you already implemented bulk requests. So um... why not publish some numbers and the code you used to generate it?

The same argument applies to ZIP and the benefits of ZIPping similar data. You said you already have this up and running. So why not just publish some numbers comparing a non-pipelined connection, a pipelined connection and your bulk GET? You can show latency, bandwidth and CPU load.

I suspect those numbers would make for a more productive conversation.

	Thanks,

			Yaron

> -----Original Message-----
> From: Jens Alfke [mailto:jens@couchbase.com]
> Sent: Monday, January 27, 2014 9:13 PM
> To: replication@couchdb.apache.org
> Subject: Re: _bulk_get protocol extension
> 
> 
> On Jan 27, 2014, at 7:26 PM, Yaron Goland <ya...@microsoft.com> wrote:
> 
> > Nevertheless he did say that so long as one probes the connection then
> pipelining is known to work. Probing just means that you can't assume that
> the server you are talking to is a 1.1 server and therefore supports pipelining.
> 
> Well, yes, that's pretty clear - I mean, I know pipelining's been
> implemented. (And on iOS and Mac the frameworks already know how to
> support pipelining, so one doesn't have to do the probing oneself.)
> 
> The problems with pipelining are higher level than that. Did you read the text
> by Ilya Grigorik that I linked to? Here's another excerpt:
> 
> 	* A single slow response blocks all requests behind it.
> 	* When processing in parallel, servers must buffer pipelined
> responses, which may exhaust server resources-e.g., what if one of the
> responses is very large? This exposes an attack vector against the server!
> 	* A failed response may terminate the TCP connection, forcing the
> client to re-request all the subsequent resources, which may cause duplicate
> processing.
> 	* Detecting pipelining compatibility reliably, where intermediaries
> may be present, is a nontrivial problem.
> 	* Some intermediaries do not support pipelining and may abort the
> connection, while others may serialize all requests.
> -
> http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPE
> LINING
> 
> (Now, HTTP 2.0 is adding multiplexing, which alleviates most of those
> problems. I'll be happy when we get to use it, but that probably won't be for
> a year or two at least.)
> 
> I also mentioned the overhead of issuing a bunch of HTTP requests versus
> just one. As a thought experiment, consider fetching a one-megabyte HTTP
> resource by using a thousand byte-range GET requests each requesting 1K of
> the file. Would this take longer than issuing a single GET request for the
> entire resource? Yeah, and probably a lot longer, even with pipelining. The
> client and the server both introduce overhead in handling requests.
> 
> Finally, consider that putting a number of related resources together into a
> single body enables better compression, since general-purpose compression
> algorithms look for repeated patterns. If I have a thousand small documents
> each of which contains a property named "this_is_my_custom_property",
> then if all those documents are returned in one response each instance of
> that string will get compressed down to a very short token. If they're
> separate responses, the string won't get compressed.
> 
> -Jens

Re: _bulk_get protocol extension

Posted by Jens Alfke <je...@couchbase.com>.

On Jan 27, 2014, at 7:26 PM, Yaron Goland <ya...@microsoft.com> wrote:

> Nevertheless he did say that so long as one probes the connection then pipelining is known to work. Probing just means that you can't assume that the server you are talking to is a 1.1 server and therefore supports pipelining.

Well, yes, that's pretty clear — I mean, I know pipelining's been implemented. (And on iOS and Mac the frameworks already know how to support pipelining, so one doesn't have to do the probing oneself.)

The problems with pipelining are higher level than that. Did you read the text by Ilya Grigorik that I linked to? Here's another excerpt:

• A single slow response blocks all requests behind it.
• When processing in parallel, servers must buffer pipelined responses, which may exhaust server resources—e.g., what if one of the responses is very large? This exposes an attack vector against the server!
• A failed response may terminate the TCP connection, forcing the client to re-request all the subsequent resources, which may cause duplicate processing.
• Detecting pipelining compatibility reliably, where intermediaries may be present, is a nontrivial problem.
• Some intermediaries do not support pipelining and may abort the connection, while others may serialize all requests.
— http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPELINING

(Now, HTTP 2.0 is adding multiplexing, which alleviates most of those problems. I'll be happy when we get to use it, but that probably won't be for a year or two at least.)

I also mentioned the overhead of issuing a bunch of HTTP requests versus just one. As a thought experiment, consider fetching a one-megabyte HTTP resource by using a thousand byte-range GET requests each requesting 1K of the file. Would this take longer than issuing a single GET request for the entire resource? Yeah, and probably a lot longer, even with pipelining. The client and the server both introduce overhead in handling requests.

Finally, consider that putting a number of related resources together into a single body enables better compression, since general-purpose compression algorithms look for repeated patterns. If I have a thousand small documents each of which contains a property named "this_is_my_custom_property", then if all those documents are returned in one response each instance of that string will get compressed down to a very short token. If they're separate responses, the string won't get compressed.

—Jens

RE: _bulk_get protocol extension

Posted by Yaron Goland <ya...@microsoft.com>.

Sorry for the delay in getting back to this mail thread. I talked with Mark Nottingham, chair of the HTTP WG in the IETF, who, to be clear, is in no way responsible for the content of this email.

Nevertheless he did say that so long as one probes the connection then pipelining is known to work. Probing just means that you can't assume that the server you are talking to is a 1.1 server and therefore supports pipelining. So you have to make the first request, get back a 1.1 response and only then make the second and subsequent requests with pipelining.

I would think it would be easier to teach people how to do probing than to introduce a completely new compound message format. But I recognize that is just my opinion, not a fact.

So please just take this a comment from the peanut gallery.

		Thanks,

			Yaron

> -----Original Message-----
> From: Jens Alfke [mailto:jens@couchbase.com]
> Sent: Friday, January 24, 2014 11:36 AM
> To: replication@couchdb.apache.org
> Subject: Re: _bulk_get protocol extension
> 
> 
> On Jan 24, 2014, at 9:35 AM, Yaron Goland <ya...@microsoft.com> wrote:
> 
> > In the HTTP WG more than a decade ago issues like this came up under the
> name 'boxcar'ing'. But with the introduction of pipelining the performance
> benefits of boxcar'ing for idempotent requests went away.
> 
> It's not that simple. Pipelining has problems, as described by Ilya Grigorik in
> his excellent new book "High Performance Browser Networking":
> 
> >> In practice, due to lack of multiplexing, HTTP pipelining creates
> >> many subtle and undocumented implications for HTTP servers,
> intermediaries, and clients: [...] Due to these and similar complications, and
> lack of guidance in the HTTP 1.1 standard for these cases, HTTP pipelining
> adoption has remained very limited despite its many benefits. Today, some
> browsers support pipelining, usually as an advanced configuration option, but
> most have it disabled.
> 
> -
> http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPE
> LINING
> 
> In my case, pipelining is off by default in Apple's HTTP client framework, and
> I've been loath to turn it on for reasons like those Grigorik describes. (Also,
> IIRC I tried turning it on a year or so ago and ran some replication
> performance tests, and didn't see noticeable improvements.)
> 
> Coalescing everything into one request also removes the overhead of
> generating and parsing each HTTP request/response, on both client and
> server sides.
> 
> -Jens

Re: _bulk_get protocol extension

Posted by Jens Alfke <je...@couchbase.com>.

On Jan 24, 2014, at 9:35 AM, Yaron Goland <ya...@microsoft.com> wrote:

> In the HTTP WG more than a decade ago issues like this came up under the name 'boxcar'ing'. But with the introduction of pipelining the performance benefits of boxcar'ing for idempotent requests went away. 

It's not that simple. Pipelining has problems, as described by Ilya Grigorik in his excellent new book "High Performance Browser Networking":

>> In practice, due to lack of multiplexing, HTTP pipelining creates many subtle and undocumented implications for HTTP servers, intermediaries, and clients: […]
>> Due to these and similar complications, and lack of guidance in the HTTP 1.1 standard for these cases, HTTP pipelining adoption has remained very limited despite its many benefits. Today, some browsers support pipelining, usually as an advanced configuration option, but most have it disabled.

— http://chimera.labs.oreilly.com/books/1230000000545/ch11.html#HTTP_PIPELINING

In my case, pipelining is off by default in Apple's HTTP client framework, and I've been loath to turn it on for reasons like those Grigorik describes. (Also, IIRC I tried turning it on a year or so ago and ran some replication performance tests, and didn't see noticeable improvements.)

Coalescing everything into one request also removes the overhead of generating and parsing each HTTP request/response, on both client and server sides.

—Jens

RE: _bulk_get protocol extension

Posted by Yaron Goland <ya...@microsoft.com>.

In the HTTP WG more than a decade ago issues like this came up under the name 'boxcar'ing'. But with the introduction of pipelining the performance benefits of boxcar'ing for idempotent requests went away. 

In a replication the source should be able to fire off GET requests down the pipeline non-stop and the remote server should be able to return them just as quickly. So have you identified why you are seeing bad performance?

	Thanks,

			Yaron

> -----Original Message-----
> From: Jens Alfke [mailto:jens@couchbase.com]
> Sent: Friday, January 24, 2014 7:21 AM
> To: replication@couchdb.apache.org
> Subject: _bulk_get protocol extension
> 
> (I'm excited about this list! There have been some topics I've wanted to bring
> up that are too implementation-oriented for the user@ list, but I haven't
> been brave enough to dive into the dev@ list because I don't know Erlang or
> the internals of CouchDB. I also really appreciate folks sharing the viewpoint
> that CouchDB is an ecosystem and an open replication protocol, not just a
> particular database implementation.)
> 
> Anyway. One topic I'd like to bring up is that, in my non-scientific
> observations, the major performance bottleneck in pull replications is the
> fact that revisions have to be transferred using individual GET requests. I've
> seen very poor performance when pulling lots of small documents from a
> distant server, like an order of magnitude below the throughput of sending a
> single huge document.
> 
> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs.
> Unfortunately this has limitations that make it unsuitable for replication; see
> my explanation at the page linked below.)
> 
> A few months ago I experimentally implemented a new "_bulk_get" REST call
> in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which
> significantly improves performance by allowing the puller to request any
> number of revisions in a single HTTP request. Again, no scientific tests or hard
> numbers, but it was enough to convince me it's worthwhile. I've
> documented it here:
> 	https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the
> standard API. The only unusual thing is that the response can contain nested
> MIME multipart bodies: the response format is multipart, with every
> requested revision in a part, but revisions containing attachments are
> themselves sent as multipart. (This shouldn't be an issue for any decent
> multipart parser, since nested multipart is pretty common in emails, but I
> think it's the first time it's happened in the CouchDB API.)
> 
> I'd be happy if this were implemented in CouchDB and made an official part
> of the API. Hopefully the spec I wrote is detailed enough to make that
> straightforward. (I don't have the Erlang skills to do it myself, though.)
> 
> -Jens

Re: _bulk_get protocol extension

Posted by Dave Cottlehuber <dc...@jsonified.com>.

Hey Jens,

that looks interesting indeed. Worth posting a jira ticket with the
link, so it doesn't get lost in email.

A+
Dave

On 24 January 2014 16:20, Jens Alfke <je...@couchbase.com> wrote:
> (I'm excited about this list! There have been some topics I've wanted to bring up that are too implementation-oriented for the user@ list, but I haven't been brave enough to dive into the dev@ list because I don't know Erlang or the internals of CouchDB. I also really appreciate folks sharing the viewpoint that CouchDB is an ecosystem and an open replication protocol, not just a particular database implementation.)
>
> Anyway. One topic I'd like to bring up is that, in my non-scientific observations, the major performance bottleneck in pull replications is the fact that revisions have to be transferred using individual GET requests. I've seen very poor performance when pulling lots of small documents from a distant server, like an order of magnitude below the throughput of sending a single huge document.
>
> (Yes, it's possible to get multiple revisions at once by POSTing to _all_docs. Unfortunately this has limitations that make it unsuitable for replication; see my explanation at the page linked below.)
>
> A few months ago I experimentally implemented a new "_bulk_get" REST call in Couchbase's replicators (Couchbase Lite and the Sync Gateway), which significantly improves performance by allowing the puller to request any number of revisions in a single HTTP request. Again, no scientific tests or hard numbers, but it was enough to convince me it's worthwhile. I've documented it here:
>         https://github.com/couchbase/sync_gateway/wiki/Bulk-GET
> It's pretty straightforward and I've tried to make it consistent with the standard API. The only unusual thing is that the response can contain nested MIME multipart bodies: the response format is multipart, with every requested revision in a part, but revisions containing attachments are themselves sent as multipart. (This shouldn't be an issue for any decent multipart parser, since nested multipart is pretty common in emails, but I think it's the first time it's happened in the CouchDB API.)
>
> I'd be happy if this were implemented in CouchDB and made an official part of the API. Hopefully the spec I wrote is detailed enough to make that straightforward. (I don't have the Erlang skills to do it myself, though.)
>
> —Jens