You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/03/04 06:03:40 UTC
[jira] [Commented] (COUCHDB-2959) Deadlock condition in replicator
with remote source and configured 1 http connection
[ https://issues.apache.org/jira/browse/COUCHDB-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179328#comment-15179328 ]
ASF GitHub Bot commented on COUCHDB-2959:
-----------------------------------------
GitHub user nickva opened a pull request:
https://github.com/apache/couchdb-couch-replicator/pull/29
Adjust minimum number of http connections to 2
Replication changes feed and main replicator process could
end up waiting on the http connection to be available, and also
waiting on each other in a gen_server call. So set minimum
number of http connections to 2 to avoid deadlock.
JIRA: COUCHDB-2959
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloudant/couchdb-couch-replicator couchdb-2959
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/couchdb-couch-replicator/pull/29.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #29
----
commit a6921daa1c9d06497d9714ae72072fa13e71d344
Author: Nick Vatamaniuc <va...@gmail.com>
Date: 2016-03-04T04:38:20Z
Adjust minimum number of http connections to 2
Replication changes feed and main replicator process could
end up waiting on the http connection to be available, and also
waiting on each other in a gen_server call. So set minimum
number of http connections to 2 to avoid deadlock.
JIRA: COUCHDB-2959
----
> Deadlock condition in replicator with remote source and configured 1 http connection
> ------------------------------------------------------------------------------------
>
> Key: COUCHDB-2959
> URL: https://issues.apache.org/jira/browse/COUCHDB-2959
> Project: CouchDB
> Issue Type: Bug
> Components: Replication
> Reporter: Nick Vatamaniuc
> Attachments: rep.py
>
>
> A deadlock that occurs that can get the starting replications to get stuck (and never update their state to triggered). This happens with a remote source and when using a single http connection and single worker.
> The deadlock occurs in this case:
> - Replication process starts, it starts the changes reader: https://github.com/apache/couchdb-couch-replicator/blob/master/src/couch_replicator.erl#L276
> - Changes reader consumes the worker from httpc pool. At some point it will make a call back to the replication process to report how much work it has done using gen_server call {{report_seq_done}}
> - In the meantime, main replication process calls {{get_pending_changes}} to get changes from the source. If the source is remote it will attempt to consumer a worker from httpc pool. However the worker is used by the change feed process. So get_pending_changes is blocked waiting for a worker to be released.
> - So changes feed is waiting for report_seq_done call to replication process to return while holding a worker and main replication process is waiting for httpc pool to release the worker and it never responds to report_seq_done.
> Attached python script (rep.py) to reproduce issue. Script creates n databases (tested with n=1000). Then replicates those databases to 1 single database. It also need Python CouchDB module from pip (or package repos).
> 1. It an can be run from ipython. By importing {{rep}}.
> 2. start dev cluster {{./dev/run --admin=adm:pass}}
> 3. {{rep.replicate_1_to_n(1000)}}
> wait....
> 4. {{rep.check_untriggered()}}
> When it fails, result might look like this:
> {code}
> {
> 'rdyno_00001_00006': None,
> 'rdyno_00001_00158': None
> }
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)