You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2017/12/18 03:38:47 UTC

[GitHub] style95 opened a new issue #1071: [Question] CouchDB crash during benchmark

style95 opened a new issue #1071: [Question] CouchDB crash during benchmark
URL: https://github.com/apache/couchdb/issues/1071
 
 
   <!--- Provide a general summary of the issue in the Title above -->
   
   CouchDB crashed during benchmarking.
   
   I deployed 3 nodes.
   
   * node1: 10.113.130.91
   * node2: 10.113.130.92
   * node3: 10.113.130.93
   
   I sent 500 docs using bulk-insert API.
   
   It showed steady performance, and after about 4 minutes, suddenly one of nodes crashed.
   ![image](https://user-images.githubusercontent.com/3447251/34088722-a1b41b32-e3ee-11e7-840c-486a41f206b9.png)
   
   I got following logs on the nodes.
   
   **node1**
   ```
   [error] 2017-12-18T03:13:09.183711Z couchdb@10.113.130.91 emulator -------- Error in process <0.21270.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [error] 2017-12-18T03:13:09.183768Z couchdb@10.113.130.91 emulator -------- Error in process <0.21304.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [warning] 2017-12-18T03:13:09.183766Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/00000000-1fffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [error] 2017-12-18T03:13:09.183804Z couchdb@10.113.130.91 emulator -------- Error in process <0.21269.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [warning] 2017-12-18T03:13:09.183981Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/00000000-1fffffff/_global_changes.1513565315 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [warning] 2017-12-18T03:13:09.184206Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/60000000-7fffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [error] 2017-12-18T03:13:09.186470Z couchdb@10.113.130.91 emulator -------- Error in process <0.21343.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [error] 2017-12-18T03:13:09.186529Z couchdb@10.113.130.91 emulator -------- Error in process <0.21372.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [warning] 2017-12-18T03:13:09.186662Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/80000000-9fffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [warning] 2017-12-18T03:13:09.186962Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/c0000000-dfffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [error] 2017-12-18T03:13:09.188375Z couchdb@10.113.130.91 emulator -------- Error in process <0.21432.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [error] 2017-12-18T03:13:09.188415Z couchdb@10.113.130.91 emulator -------- Error in process <0.21424.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [warning] 2017-12-18T03:13:09.188476Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/e0000000-ffffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [warning] 2017-12-18T03:13:09.188649Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/20000000-3fffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [error] 2017-12-18T03:13:09.190738Z couchdb@10.113.130.91 emulator -------- Error in process <0.21483.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [error] 2017-12-18T03:13:09.190787Z couchdb@10.113.130.91 emulator -------- Error in process <0.21489.8> on node 'couchdb@10.113.130.91' with exit value: {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,"src/mem3_rpc.erl"},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,"src/mem3_rep.erl"},{line,194}]},{mem3_rep,repl,2,[{file,"src/mem3_rep.erl"},...
   [warning] 2017-12-18T03:13:09.190856Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/40000000-5fffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   [warning] 2017-12-18T03:13:09.191076Z couchdb@10.113.130.91 <0.294.0> -------- mem3_sync shards/a0000000-bfffffff/lambda-bmt_activations.1513565327 couchdb@10.113.130.92 {{rexi_DOWN,{'couchdb@10.113.130.92',noconnect}},[{mem3_rpc,rexi_call,2,[{file,[115,114,99,47,109,101,109,51,95,114,112,99,46,101,114,108]},{line,269}]},{mem3_rep,calculate_start_seq,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,194}]},{mem3_rep,repl,2,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,175}]},{mem3_rep,go,1,[{file,[115,114,99,47,109,101,109,51,95,114,101,112,46,101,114,108]},{line,81}]},{mem3_sync,'-start_push_replication/1-fun-0-',2,[{file,[115,114,99,47,109,101,109,51,95,115,121,110,99,46,101,114,108]},{line,208}]}]}
   ```
   
   node2 crashed.
   But I am not sure what the problem is.
   
   **node2**
   ```
   [error] 2017-12-18T03:12:54.538612Z couchdb@10.113.130.92 <0.1329.0> -------- OS Process died with status: 137
   [error] 2017-12-18T03:12:54.549992Z couchdb@10.113.130.92 <0.1329.0> -------- gen_server <0.1329.0> terminated with reason: {exit_status,137}
     last msg: {#Port<0.8125>,{exit_status,137}}
        state: {os_proc,"./bin/couchjs ./share/server/main.js",#Port<0.8125>,#Fun<couch_os_process.writejson.2>,#Fun<couch_os_process.readjson.1>,5000,300000}
   [info] 2017-12-18T03:12:54.551745Z couchdb@10.113.130.92 <0.213.0> -------- couch_proc_manager <0.1329.0> died {exit_status,137}
   [error] 2017-12-18T03:12:54.552261Z couchdb@10.113.130.92 <0.1329.0> -------- CRASH REPORT Process  (<0.1329.0>) with 0 neighbors exited with reason: {exit_status,137} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_os_process,init,['Argument__1']}, ancestors: [<0.1328.0>], messages: [], links: [<0.213.0>], dictionary: [], trap_exit: false, status: running, heap_size: 610, stack_size: 27, reductions: 2890
   ```
   
   
   And I could occasionally observed following error as well.
   I wonder why one of my nodes is suddenly disallowed.
   
   ```
   [error] 2017-12-18T03:13:06.373063Z couchdb@10.113.130.91 <0.16717.8> -------- ** Connection attempt from disallowed node 'couchdb@10.113.130.92' **
   ```
   
   
   Though there are high loads, I expect couchdb reject requests rather than crashed.
   But I don't know how to configure it.
   
   I used default configuration with following changes.
   
   **default.ini**
   ```
   max_dbs_open = 1024
   os_process_limit = 2000
   os_process_soft_limit = 1500
   check_interval = 60
   _default = [{db_fragmentation, "30%"}, {view_fragmentation, "30%"}]
   ```
   
   **vm.args**
   ```
   +A 1024
   ```
   
   
   I already looked into couchDB guide and I could know what the configurations are, but it was not easy for me to figure out what is the proper value for each configurations.
   
   Could anyone guide me to tune it for production? or are there any guide to tune CouchDB?
   It would be great to have some recommendation such as `os_process_limit = CPU cores * 10`, `+A = CPU cores * 15` and so on.  
   
   I am using following machines:
   
   * CPU: 32 cores
   * MEM: 64 GB
   * DISK: 1.3 TB SSD
   * Network: 1G
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services