You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Dave Cottlehuber (JIRA)" <ji...@apache.org> on 2013/07/24 09:27:49 UTC
[jira] [Commented] (COUCHDB-1856) hang and restart when replicate remotely of database that has doc > 10M, with 600kb/s network speed

    [ https://issues.apache.org/jira/browse/COUCHDB-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718072#comment-13718072 ] 

Dave Cottlehuber commented on COUCHDB-1856:
-------------------------------------------

Hi [~zrao] you're running very close to the max res limit of 32 bit procs on windows -- is only a 32 bit release for the moment.

I don't see anything in the log file related to the actual hang / crash, and I'm not sure this issue is really a replication one for the moment.

If your docs are large, and your throughput is < 0,6 MB/sec then its likely you are timing out in per-doc replication, on a slow/unreliable TCP link.

Can you provide more information here?

- have you altered the replication configuration at all?
- are you running any filtered replication processes at either end?
- if your replication works locally then I suspect you have an issue either with the internet, or the remote endpoint
- more logs please, in debug mode from both ends
- any general info on your docs - JSON body size, # and length of attachments,
- specific version of erlang + couchdb  used at both ends

For the moment, I'd suggest -

1. dropping your replication concurrency down in local.ini:

[replicator]
; one worker only
worker_processes = 1
; very small batch size to decrease replicator mem usage
worker_batch_size = 50
; use a 5 minute timeout for HTTP connection 
connection_timeout = 300000
; don't retry, fail immediately
retries_per_request = 1

you can change these through futon UI and that won't require a reboot. Note these impact *all* replications so take that into consideration.

2. more logs

- enable debug mode on both ends, assuming that's feasible
- reset the replication changes above to previous settings
- retry the replication
- disable debug mode

I'll not have time to look further until late next week likely BTW.
                
> hang and restart when replicate remotely of database that has doc > 10M, with 600kb/s network speed
> ---------------------------------------------------------------------------------------------------
>
>                 Key: COUCHDB-1856
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1856
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.3, 1.3.1
>            Reporter: Zhiqing Rao
>         Attachments: couch_replicate_hang_erl_process.jpg, couch_replicate_hang.log, couch_replicate_hang_networking.jpg
>
>
> When I remotely replicate a database that has doc > 10M, with 600kb/s network speed, in a win7 64bit platform with couchdb 1.3.1, the couchdb server will launch the replication, but erl.exe soon reach up to commiting > 2GB memory, then
> couchdb server hangs until a restart.
> Two things that might helpful:
> 1) It's fine for me to replicate the database in the same couchdb server (in the same machine);  
> 2) My web browers, IE9/10, chrome, firefox, also hang or without response, at sometimes,  when I open the documents in the database with URL; 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira