You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Jeff Hinrichs (JIRA)" <ji...@apache.org> on 2009/03/01 05:16:12 UTC
[jira] Updated: (COUCHDB-270) Replication w/ Large Attachments
Fails
[ https://issues.apache.org/jira/browse/COUCHDB-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Hinrichs updated COUCHDB-270:
----------------------------------
Attachment: couchdb270_Test.py
The attached script can produce a number of errors in replication. Hopefully a python based script is helpful for you. The script requires couchdb-python 0.5 and nose (to run the tests)
One thing I have discovered is that the replication issues are not limited to attachments but are related to overall document size. I am creating these replication issues by only using docs of large size.
You will need to modify the top of the script
srvAuri = 'http://192.168.2.52:5984/'
srvBuri = 'http://192.168.2.194:5984/'
to make sense for your environment
The bigger the document size the harder/faster couchdb dies. 1MB documents are enough to make it groan before going away, 10MB will occasionally garner a wimpering before death -- though not always, while 20MB documents are akin to a head shot.
> Replication w/ Large Attachments Fails
> --------------------------------------
>
> Key: COUCHDB-270
> URL: https://issues.apache.org/jira/browse/COUCHDB-270
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 0.9
> Environment: Apache CouchDB 0.9.0a748379
> Reporter: Jeff Hinrichs
> Attachments: couchdb270_Test.py
>
>
> Attempting to replicate a database with largish attachments (<= ~18MB of attachments in a doc, less thatn 200 docs) from one machine to another fails consistently and at the same point.
> Scenario:
> Both servers are running from HEAD and I've been tracking for some time. This problem has been around as long as I've been using couch.
> Machine A holds the original database, Machine B is the server that is doing a PULL replication
> During the replication, Machine A starts showing the following sporadically in the log:
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5902.3>] 'GET'
> /delasco-invoices/INV00652429?revs=true&attachments=true&latest=true&open_revs=["425644723"]
> {1,
> 1}
> Headers: [{'Host',"192.168.2.52:5984"}]
> [Fri, 27 Feb 2009 14:02:48 GMT] [error] [<0.5901.3>] Uncaught error in
> HTTP request: {exit,normal}
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] Stacktrace:
> [{mochiweb_request,send,2},
> {couch_httpd,send_chunk,2},
> {couch_httpd_db,db_doc_req,3},
> {couch_httpd_db,do_db_req,2},
> {couch_httpd,handle_request,3},
> {mochiweb_http,headers,5},
> {proc_lib,init_p,5}]
> [Fri, 27 Feb 2009 14:02:48 GMT] [debug] [<0.5901.3>] HTTPd 500 error response:
> {"error":"error","reason":"normal"}
> As the replication continues, the frequency of these error "Uncaught error in HTTP request: {exit,normal}" increase. Until the error is being constantly repeated. Then Machine B stops sending requests, no more log output, no errors, the last thing in Machine B's log file is:
> [Fri, 27 Feb 2009 14:03:24 GMT] [info] [<0.20893.1>] retrying
> couch_rep HTTP get request due to {error, req_timedout}: [104,116,
> 116,112,58,
> 47,47,49,
> 57,50,46,
> 49,54,56,
> 46,50,46,
> 53,50,58,
> 53,57,56,
> 52,47,100,
> 101,108,97,
> 115,99,111,
> 45,105,110,
> 118,111,
> 105,99,101,
> 115,47,73,
> 78,86,48,
> 48,54,53,
> 50,49,51,
> 56,63,114,
> 101,118,
> 115,61,116,
> 114,117,
> 101,38,97,
> 116,116,97,
> 99,104,109,
> 101,110,
> 116,115,61,
> 116,114,
> 117,101,38,
> 108,97,116,
> 101,115,
> 116,61,116,
> 114,117,
> 101,38,111,
> 112,101,
> 110,95,114,
> 101,118,
> 115,61,91,
> 34,
> <<"3070455362">>,
> 34,93]
> A request for status from the couchdb init.d script returns nothing and checking the processes returns:
> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep cou
> 29281 pts/2 S+ 0:00 grep cou
> (demo-couchdb)jlh@mars:~/projects/venvs/demo-couchdb/src$ ps ax|grep beam
> 29305 pts/2 R+ 0:00 grep beam
> In fact, couch has gone away completely on Machine B. In fact, couch's death is so quick it can't even say why.
> Attempts to incrementally replicate after the first failure die at exactly the same place.
> I can replicate this same database on the same machine from one database to another without issue. I can dump and reload the database with no problems.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.