You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Randall Leeds <ra...@gmail.com> on 2010/04/07 23:46:06 UTC

ibrowse queue

Thanks to Filipe's patch, the replicator now has configurable
concurrency and pipeline length.

ibrowse enforces these two variables, returning {error, retry_later}
when there is no connection with space in the pipeline available.
These limits are set on per Host-Port combination. As such, they are
shared across concurrent replications between hosts.

In preparation for lowering the concurrency in our production setup, I
was reasoning through potential problems and came upon the following
scenario:

Due to
  1) low pipeline/concurrency settings
  2) many concurrent replications to/from the same host
ibrowse could return a lot of {error, retry_later} responses. In a
particularly nasty/busy scenario this could cause replications to fail
since couch_rep_reader has a fixed number of requests it will
*attempt* to issue (100).

If
  (100 requests/replication) * (n concurrent replications) >>>
max_http_sessions * max_http_pipeline_size
then
  replications may fail when in fact there are no network errors.

I propose to make couch_rep_httpc catch {error, retry_later} and treat
it specially. Specifically, it should not decrement the retry count.

My questions are:
  1) should there still be an exponential backoff for this retry?
  2) would you be in favor of committing this patch?

Re: ibrowse queue

Posted by Robert Newson <ro...@gmail.com>.

I'm always in favor of increasing rather than static backoffs because
it tolerates more environments (I use
http://en.wikipedia.org/wiki/Truncated_binary_exponential_backoff).

If it did back off this way would it still be necessary to treat
retry_later specially? The proposal not to decrement the retry counter
worries me because an operation that continues to fail should
eventually stop trying; clearly something is more broken than it
should be and adding to the problem is counterproductive.

B.

On Wed, Apr 7, 2010 at 11:10 PM, Randall Leeds <ra...@gmail.com> wrote:
> Also, I'm concerned that we cannot rely on couch not to starve the
> reader requests while allowing the missing revs requests, leading to
> an unbounded growth of the reader queue. The reason for this is that
> the reader requests are started with spawn_monitor and therefore
> erlang scheduling might give the reader loop time issue the next
> couch_rep_missing_revs:next/1 call before any or all of the document
> read processes call couch_rep_reader:open_doc_revs/3.
>

Re: ibrowse queue

Posted by Randall Leeds <ra...@gmail.com>.

Also, I'm concerned that we cannot rely on couch not to starve the
reader requests while allowing the missing revs requests, leading to
an unbounded growth of the reader queue. The reason for this is that
the reader requests are started with spawn_monitor and therefore
erlang scheduling might give the reader loop time issue the next
couch_rep_missing_revs:next/1 call before any or all of the document
read processes call couch_rep_reader:open_doc_revs/3.