You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "James Marca (JIRA)" <ji...@apache.org> on 2011/07/18 19:50:57 UTC

[jira] [Created] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
----------------------------------------------------------------------------

                 Key: COUCHDB-1226
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
             Project: CouchDB
          Issue Type: Bug
          Components: Replication
    Affects Versions: 1.1
         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
            Reporter: James Marca


When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.

For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):

[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
[Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
[Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
[Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
[Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
[Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
[Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
[Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
[Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
[Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
[Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20

Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:

[Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
          {'Connection',"close"},
          {'Content-Type',"application/json"},
          {'Host',"***[pullserver]***.edu"},
          {'Transfer-Encoding',"chunked"}]
[Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
[Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
[Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
[Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
[Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
[Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
[Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
[Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
[Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
[Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
[Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
[Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
[Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
[Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
[Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
[Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
[Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
[Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
[Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
[Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
[Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
[Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
[Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
[Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
[Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
[Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
[Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
[Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
[Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200

Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.

Again, the log on the crashing server isn't helpful (more or less the same as above)

The replication program gets through about 8 to 12 databases before it crashes.

Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.

The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.

If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067407#comment-13067407 ] 

Randall Leeds commented on COUCHDB-1226:
----------------------------------------

About how large is the biggest document on the source?

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069301#comment-13069301 ] 

Filipe Manana commented on COUCHDB-1226:
----------------------------------------

Thanks for testing and reporting James

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068726#comment-13068726 ] 

Randall Leeds commented on COUCHDB-1226:
----------------------------------------

Crash dump might be useful. It looks like that's 3.3 GB of heap! Not sure why it would be allocating so much.

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067463#comment-13067463 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------


Hard to say exactly, but *all* documents are around 700KB.  Each
document contains a single day's worth of information for a detector,
with data collected every 30s.  Sometimes data isn't collected, so in
those cases there are null values which should take up less space, but
there are still usually 37 arrays of data by 2880 time steps.  The
larger databases have data for each day, while smaller ones have
missing days.

(Not sure if replying to email gets into JIRA.)

regards,
James





> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067367#comment-13067367 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

I just tried using the _replicator database (the new way) rather than POST calls (the old way).  The crash still happens with the same characteristics (works fine, RAM increases, CouchDB dies without any log message)

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068782#comment-13068782 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------


yeah, that's why I suspect it is a memory leak...it shouldn't actually
need that much to process a 900MB database.

find it at http://anne.its.uci.edu/tmp/erl_crash.dump 



> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Marca updated COUCHDB-1226:
---------------------------------

    Attachment: topcouch.log

This file is an edited version of the output of top as couchdb crashes while handling replication, showing the gradual increase in RAM as couchdb handles 7 consecutive database replications and crashes on the 7th.

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069292#comment-13069292 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

I just tried upgrading to trunk (after reading COUCHDB-1230), and the crashing has stopped.

CouchDB 1.2.0a-b0baa80-git
Spidermonkey 1.8.5

I am now (finally) relaxing while replicating multiple (12) databases with large documents to a CouchDB 1.1.0 server. The cpu load is  2% and the  ram is also at 2.2%, and the network is saturated with _bulk_docs puts.  Orders of magnitude better performance.

I understand that 1.2 contains a major reworking of the replication engine, and for my situation it works perfectly...many thanks to the developer.

If someone could tag this "fixed in 1.2" I'd appreciate it.  I do not know how to use JIRA at all.

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067263#comment-13067263 ] 

Filipe Manana commented on COUCHDB-1226:
----------------------------------------

Which Erlang OTP version?
Also, have you tried compacting the databases before replication to see if it helps? (might have some relation to COUCHDB-968)

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068874#comment-13068874 ] 

Robert Newson commented on COUCHDB-1226:
----------------------------------------

I've definitely seen this before in production. Unfortunately it's very hard to determine a cause. I started to suspect the Erlang VM itself. In my case it tried to allocate slightly more RAM than the machine had in total (so as not to keep you in suspense, it failed to do so).

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069072#comment-13069072 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------



I stared at this a lot last night.  It always fails on the replicator,
not the replicatee, regardless of whether it is push or pull.

What happens is that (on restart) it writes our a check point, then
the data gets pushed to the target, then it writes another checkpoint
say 30 more than the first one.

While doing that CPU is really high and RAM bobbles up and down, but
generally drifts up.

Then it crashes and I restart (this machine only has 4 gig, not 8) and
it tries again.

So it is something in the writing of the checkpoints, and possibly
related to how big my docs are.

Is there anyway to ask it to write checkpoints more often that once every 25 to
30 documents?

Also, I am going to try this on my laptop, which is running slackware
not gentoo, to see if maybe it is something in the tool chain or
libraries that is causing the crash.

Regards, 
James

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067341#comment-13067341 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

compacting databases prior to replicating had no effect.  CouchDB's beam process still grew in size with each replication, and then CouchDB shut down.


> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Randall Leeds (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Leeds resolved COUCHDB-1226.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2

Fixed by Filipe in the new replicator. I suspect the pipelining may have been building up too many requests on the target, but no matter. If someone wants to investigate further and try to fix for 1.1.1 that'd be cool, but it might be easier to backport the new replicator (if that's kosher).

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>             Fix For: 1.2
>
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068661#comment-13068661 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

I recompiled this morning to version 1.1.1a8d53b7d-git

Running couchdb from the command line, I got the following crash while replicating:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 3563526520 bytes of memory (of type "heap").
Aborted

I tried to upload the crash dump, but JIRA wouldn't let me for some reason.  I can post it somewhere if it might be useful.

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "Filipe Manana (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069302#comment-13069302 ] 

Filipe Manana commented on COUCHDB-1226:
----------------------------------------

Thanks for testing and reporting James

> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1226) Replication causes CouchDB to crash. I *suspect* a memory leak of some kind

Posted by "James Marca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067271#comment-13067271 ] 

James Marca commented on COUCHDB-1226:
--------------------------------------

from erl, I see:
    Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [kernel-poll:false]

As to compaction prior to replication I have not tried that.  However, the "source" database is created from scratch, with no edits or deletions, so it shouldn't need much compaction, and the "target" is created just prior to replication.

However, I will compact all databases on the "source" machine prior to the next crash/reload/rerun cycle to see if it helps.


> Replication causes CouchDB to crash.  I *suspect* a memory leak of some kind
> ----------------------------------------------------------------------------
>
>                 Key: COUCHDB-1226
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1226
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.1
>         Environment: Gentoo Linux, CouchDB built using standard ebuild.  Rebuilt July 2011.
>            Reporter: James Marca
>         Attachments: topcouch.log
>
>
> When replicating databases (pull replication), CouchDB will silently crash.  I suspect a memory leak is leading to the crash, because I watch the beam process slowly creep up in RAM usage, then the server dies.
> For the crashing server, the log on "debug" doesn't seem very helpful.  It says (with manually scrubbed server address):
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10054.0>] didn't find a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 16:23:20 GMT] [info] [<0.10032.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.10054.0>
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #1
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 2
> [Mon, 18 Jul 2011 16:23:20 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #2
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 10
> [Mon, 18 Jul 2011 16:23:23 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #10
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 14
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.10070.0>] missing_revs updating committed seq to 20
> [Mon, 18 Jul 2011 16:23:24 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #20
> [Mon, 18 Jul 2011 16:23:25 GMT] [debug] [<0.10054.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 16:23:36 GMT] [info] [<0.10054.0>] recording a checkpoint for http://***.edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 20
> Then, when I restart CouchDB, and restart the node.js program that is setting up the replication jobs, the crashed replication job picks up where it left off and completes just fine.  Again, I scrubbed my server addresses in this log snippet.:
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] 'POST' /_replicate {1,1} from "128.*.*.*"
> Headers: [{'Authorization',"Basic amFtZXM6bWdpY24wbWIzcg=="},
>           {'Connection',"close"},
>           {'Content-Type',"application/json"},
>           {'Host',"***[pullserver]***.edu"},
>           {'Transfer-Encoding',"chunked"}]
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3562.0>] OAuth Params: []
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/
> [Mon, 18 Jul 2011 17:22:53 GMT] [debug] [<0.3580.0>] found a replication log for vdsdata/d12/2007/1210882
> [Mon, 18 Jul 2011 17:22:53 GMT] [info] [<0.3562.0>] starting new replication "431a3f5bae52a6b27da72e42dc7b9fe3+create_target" at <0.3580.0>
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #22
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 37
> [Mon, 18 Jul 2011 17:22:56 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #37
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #39
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 47
> [Mon, 18 Jul 2011 17:22:58 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #47
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 57
> [Mon, 18 Jul 2011 17:23:00 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #57
> [Mon, 18 Jul 2011 17:23:01 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 57
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #62
> [Mon, 18 Jul 2011 17:23:19 GMT] [debug] [<0.3595.0>] missing_revs updating committed seq to 62
> [Mon, 18 Jul 2011 17:23:22 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #78
> [Mon, 18 Jul 2011 17:23:24 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:23:29 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 78
> [Mon, 18 Jul 2011 17:23:57 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #255
> [Mon, 18 Jul 2011 17:24:02 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:02 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 255
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: W Processed source update #347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.3580.0>] target doesn't need a full commit
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3580.0>] recording a checkpoint for http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882 at source update_seq 347
> [Mon, 18 Jul 2011 17:24:09 GMT] [debug] [<0.83.0>] New task status for 431a3f: http://[sourceserver].edu:5984/vdsdata%2fd12%2f2007%2f1210882/ -> vdsdata/d12/2007/1210882: Finishing
> [Mon, 18 Jul 2011 17:24:09 GMT] [info] [<0.3562.0>] 128.*.*.* - - 'POST' /_replicate 200
> Letting that replication program run, and watching top, CouchDB's total share of RAM crept up to 70%, then it crashed.
> Again, the log on the crashing server isn't helpful (more or less the same as above)
> The replication program gets through about 8 to 12 databases before it crashes.
> Each database (when replicated to the target server) takes up on average around 700MB, un-compacted.
> The databases are all similar (annual data for detectors), with one doc per day's data.  Each document is around 700K.
> If there is any more information (or more helpful information) I can provide, please let me know.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira