You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Bob Hathaway <bo...@ymail.com> on 2015/07/25 04:50:46 UTC

Re: CouchDB Filtered Replication - Group of Doc's Don't appear to be batched

Our couchdb replication runs fast without filters to a couch instance.  
But from that
couch instance to another a filter is doing a simple check and sync'ing 
about 10x slower.
Looking at couch debug logging, the bulk_docs Content-length is 10x 
smaller  with the filter.
With filter the _bulk_docs rate per minute appears identical to the 
number of docs sync'd.
Without the filter, the _bulk_docs rate per minute is 10x less than docs 
sync'd.

It would appear the couch replication protocol groups 10 docs in a 
_bulk_docs POST without a replication filter
but  with the filter there is no grouping and bulk_docs appears to only 
contain a single doc.

Does couchdb filter replication not group docs and send 1 doc in each 
_bulk_docs call to the target host?

Is there some configuration which would allow the filter replication to 
group docs to speed up replication?

-- 
Robert Hathaway
President & Chief Software Architect
SOA Object Systems, LLC
office:  201-408-5828
cell:    201-390-7602
email: rjhsoa@gmail.com

Re: CouchDB Filtered Replication - Group of Doc's Don't appear to be batched

Posted by Alexander Shorin <kx...@gmail.com>.
On Sun, Aug 2, 2015 at 3:25 AM, Adam Kocoloski <ko...@apache.org> wrote:
> The replicator will dynamically choose _bulk_docs batch sizes based on the number of documents that are ready to be transmitted to the source. It’s possible to set an upper bound on the size of the batch, but at this time it’s not possible to set a lower bound.

Suddenly, batch size is not configurable[1] and strictly limited by
512KiB. For local targets it's limited by 10 documents.

[1] : https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_worker.erl#L415
[2]: https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator_worker.erl#L29

--
,,,^..^,,,

Re: CouchDB Filtered Replication - Group of Doc's Don't appear to be batched

Posted by Adam Kocoloski <ko...@apache.org>.
Hi Bob,

The replicator will dynamically choose _bulk_docs batch sizes based on the number of documents that are ready to be transmitted to the source. It’s possible to set an upper bound on the size of the batch, but at this time it’s not possible to set a lower bound.

It sounds like what’s happening here is that the replicator is faster than the filter function, and that it’s constantly waiting for the next document to pass the filter. One sanity check you might try is to request the filtered _changes feed directly and see what kind of throughput you get. The filtered _changes feed sets the upper bound on the replication throughput you can achieve with that filter. If the _changes feed is fast but the replication is slow, the next thing you should try to do is minimize the replication-related resource consumption on the source — e.g., is the replication mediated by the server hosting the source database, and if so do you have an opportunity to mediate the replication on a different server? Cheers,

Adam

> On Jul 24, 2015, at 10:50 PM, Bob Hathaway <bo...@ymail.com> wrote:
> 
> Our couchdb replication runs fast without filters to a couch instance.  But from that
> couch instance to another a filter is doing a simple check and sync'ing about 10x slower.
> Looking at couch debug logging, the bulk_docs Content-length is 10x smaller  with the filter.
> With filter the _bulk_docs rate per minute appears identical to the number of docs sync'd.
> Without the filter, the _bulk_docs rate per minute is 10x less than docs sync'd.
> 
> It would appear the couch replication protocol groups 10 docs in a _bulk_docs POST without a replication filter
> but  with the filter there is no grouping and bulk_docs appears to only contain a single doc.
> 
> Does couchdb filter replication not group docs and send 1 doc in each _bulk_docs call to the target host?
> 
> Is there some configuration which would allow the filter replication to group docs to speed up replication?
> 
> -- 
> Robert Hathaway
> President & Chief Software Architect
> SOA Object Systems, LLC
> office:  201-408-5828
> cell:    201-390-7602
> email: rjhsoa@gmail.com
>