You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Darren Gibbard (JIRA)" <ji...@apache.org> on 2014/02/18 16:50:21 UTC

[jira] [Created] (COUCHDB-2070) [1.4.0] CouchDB Replication Crashes

Darren Gibbard created COUCHDB-2070:
---------------------------------------

             Summary: [1.4.0] CouchDB Replication Crashes
                 Key: COUCHDB-2070
                 URL: https://issues.apache.org/jira/browse/COUCHDB-2070
             Project: CouchDB
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: Replication
            Reporter: Darren Gibbard


Hi all,
I have an issue at the moment that appears to have followed me from v1.2.1 with erlang R14, through to an upgrade to v1.4.0 with R16B01.

I have 20 "remote" nodes, and one "central" node; and each of the remote instances are configured with Bi-Direction replication (ie. no replication defined on the Central node directly). Single main database of ~600,000 documents at ~11GB in size.

On the remote nodes, and more frequently the Central node, I get *huge* (3000+ lines) errors in the logs- seemingly intermittently; I'm yet to track down the root cause here. Open file handles and ERL_MAX_PORTS are set to values upwards of 16k.

Other stats:
{noformat}
$ sudo su - couchdb -c "lsof | grep -c ."
1511

$ sudo netstat -npla | grep "ESTAB" | grep -c .
310

$ ps -ef | grep -c "^couchdb" 
19
{noformat}

An example log from a Remote node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01.20140218.log
An example log from the Central node is: http://dgunix.com/cdblog/couchdb_v1.4.0_erl16B01_central.20140218.log

The main error line is "{error,{error,req_timedout}}}}" for either "_bulk_docs" on remote nodes, or "_revs_diff" on the central node it would seem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)