You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/06/15 11:35:36 UTC

[GitHub] [couchdb] vladimirralev opened a new issue #3630: Large number of DBs freeze the cluster when a node dies

vladimirralev opened a new issue #3630:
URL: https://github.com/apache/couchdb/issues/3630


   ## Description
   
   I create 100,000 identical test databases with 100 documents each(or more in other tests) on a 3 node cluster. Then I bring one node down and continue to create databases in the remaining cluster. At this point creating new DBs doesn't work anymore and times out. More tests show there is a gradual slowness buildup noticeable from 10K DBs onwards progressing to a completely unusable state at about 60K DB (when a node is down). When all nodes are up the nodes sync and the cluster is very fast again.
   
   ## Steps to Reproduce
   
   Build a cluster with 3 machines r4-couch01-03. Create 100K DBs on r4-couch01. Then bring down the r4-couch03 machine and watch the script freeze. Additional replication attempts with the script also fail similarly. Issue is reproducible with 4-node cluster as well.
   
   I use this script to replicate the DB many times on r4-couch01
   
   ```
   for i in {1..200000}
   do
    echo "Doing $i"
    curl -m 9600 -X POST -H "Content-Type: application/json" -d "{\"source\":\"http://user:pass@r4-couch01:5984/testdb100docs\",\"target\":\"http://user:pass@r4-couch01:5984/smalltestdb$i\",\"create_target\":true}" http://user:pass@r4-couch01:5984/_replicate
   done
   ```
   
   
   
   ## Expected Behaviour
   
   I expect the cluster to continue working when one node is down, and even with two nodes down for my config.
   
   ## Your Environment
   
   ```
   {"couchdb":"Welcome","version":"3.1.1","git_sha":"ce596c65d","uuid":"6d44338b0b68f9437184992aa3587239","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}
   ```
   Settings are defaults from the distro rpm on centos7
   ```
   [cluster]
   q=2
   n=3
   ```
   
   Here is one DB stats:
   ```
   {"db_name":"smalltestdb1","purge_seq":"0-g1AAAAB_eJzLYWBgYMpgTmFQTc4vTc5ISXIoMtEFMw0M9TLzSvSKMvPSkyrzEnNT9ZLzc3NAyvNYgCRDA5D6DwRZiQwk6k9kSKqHaMwCANUcKg4","update_seq":"99-g1AAAACFeJzLYWBgYMpgTmFQTc4vTc5ISXIoMtEFMw0M9TLzSvSKMvPSkyrzEnNT9ZLzc3NAyvNYgCRDA5D6DwRZSQwMrKtJNCKRIakeqpeNJQsAvewqyg","sizes":{"file":8352220,"external":8386353,"active":6437405},"props":{},"doc_del_count":0,"doc_count":100,"disk_format_version":8,"compact_running":false,"cluster":{"q":2,"n":3,"w":2,"r":2},"instance_start_time":"0"}
   ```
   
   This test was done on CouchDB3.1.1, but the same issue is present on CouchDB2.1 as well. The only known version that doesn't suffer from this is bigcouch (0.4.1)
   
   Same issue is present with DBs of any size tested - from 100 documents to 10K documents.
   
   ## Additional Context
   
   I tried it with debug logs and with no logging enabled to rule out some excessive logging issue. 
   
   Debug logs show this:
   ```
   [notice] 2021-06-13T07:32:32.180795Z couchdb@r4-couch01.internal.com <0.271.0> -------- rexi_server_mon : cluster unstable
   [notice] 2021-06-13T07:32:32.180959Z couchdb@r4-couch01.internal.com <0.270.0> -------- rexi_buffer : cluster unstable
   [notice] 2021-06-13T07:32:32.180951Z couchdb@r4-couch01.internal.com <0.1127.0> -------- couch_replicator_clustering : cluste
   r unstable
   [notice] 2021-06-13T07:32:32.181281Z couchdb@r4-couch01.internal.com <0.1788.0> -------- Stopping replicator db changes liste
   ner <0.24052.80>
   [notice] 2021-06-13T07:32:32.181103Z couchdb@r4-couch01.internal.com <0.265.0> -------- rexi_server_mon : cluster unstable
   [debug] 2021-06-13T07:32:32.181523Z couchdb@r4-couch01.internal.com <0.269.0> -------- Supervisor rexi_buffer_sup started rex
   i_buffer:start_link('rexi_buffer_couchdb@r4-couch03.internal.com') at pid <0.5747.81>
   [notice] 2021-06-13T07:32:32.181613Z couchdb@r4-couch01.internal.com <0.264.0> -------- rexi_server : cluster unstable
   [debug] 2021-06-13T07:32:32.182003Z couchdb@r4-couch01.internal.com <0.263.0> -------- Supervisor rexi_server_sup started rex
   i_server:start_link('rexi_server_couchdb@r4-couch03.internal.com') at pid <0.9730.81>
   .......
   .....
   ....
   [debug] 2021-06-13T07:32:37.676293Z couchdb@r4-couch01.internal.com <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb1.1622759864 -> 'couchdb@r4-couch03.internal.com' to mem3_sync queue
   [debug] 2021-06-13T07:32:37.677179Z couchdb@r4-couch01.internal.com <0.289.0> -------- adding shards/80000000-ffffffff/smalltestdb10.1622759936 -> 'couchdb@r4-couch03.internal.com' to mem3_sync queue
   [debug] 2021-06-13T07:32:37.677550Z couchdb@r4-couch01.internal.com <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb10.1622759936 -> 'couchdb@r4-couch03.internal.com' to mem3_sync queue
   [debug] 2021-06-13T07:32:37.680024Z couchdb@r4-couch01.internal.com <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb100.1622760748 -> 'couchdb@r4-couch03.internal.com' to mem3_sync queue
   .....
   ```
   ... this goes on for a long time and logs stop printing at some point
   
   I did some remsh analysis and took a snapshot of the processes. The only interesting issue I found is some processes had queued messages related to a node being shutdown. 
   
   * Before I shutdown a node:
   ```
   (couchdb@r4-couch01.internal.com)22> P1 = [process_info(P) || P<-processes()].
   ```
   * After I shutdown a node:
   ```
   (couchdb@r4-couch01.internal.com)23> P2 = [process_info(P) || P<-processes()].
   ```
   * After the node is down and I start replication process:
   ```
   (couchdb@r4-couch01.internal.com)25> P3 = [process_info(P) || P<-processes()].
   ```
   
   Here are some results:
   ```
   (couchdb@r4-couch01.internal.com)30> length(P1).                                                                      
   765
   (couchdb@r4-couch01.internal.com)31> length(P2).
   768
   (couchdb@r4-couch01.internal.com)32> length(P3).
   851
   (couchdb@r4-couch01.internal.com)36> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P1)).
   []
   ok
   (couchdb@r4-couch01.internal.com)37> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P2)).
   []
   ok
   (couchdb@r4-couch01.internal.com)38> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P3)).
   [[{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1266155522.11768>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1266155522.11769>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3326587>]},
     {dictionary,[{dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"361658b552"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {body_time,0},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,144190},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1266155522.11478>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1266155522.11479>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3325869>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"8d13a81621"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,4185},
     {heap_size,4185},
     {stack_size,38},
     {reductions,36373},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1263271939.104768>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.104769>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1266155522.12280>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1266155522.12281>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3325871>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"ed60378cdd"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,4185},
     {heap_size,4185},
     {stack_size,38},
     {reductions,116983},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1259077639.153906>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.153907>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217118>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217119>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3325887>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"81ea0debc3"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,4185},
     {heap_size,4185},
     {stack_size,38},
     {reductions,95303},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1260912645.28924>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1260912645.28925>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217319>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217320>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328368>]},
     {dictionary,[{dont_log_request,true},
                  {chttpd_stats,{st,0,0,0}},
                  {nonce,"12aac7e7f3"},
                  {body_time,0},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {mochiweb_request_recv,true},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,8370},
     {heap_size,4185},
     {stack_size,38},
     {reductions,93093},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,6}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1272709121.79576>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1272709121.79577>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3325877>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"fa5229fa02"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]}, 
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,6772},
     {heap_size,6772},
     {stack_size,38},
     {reductions,24324},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1258029065.53052>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1258029065.53053>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328379>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"d159db4198"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,6772},
     {heap_size,6772},
     {stack_size,38},
     {reductions,66541},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1272709121.79756>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1272709121.79757>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3326440>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"abe0194147"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,10958},
     {heap_size,10958},
     {stack_size,38},
     {reductions,36476},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1261174788.117014>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1261174788.117015>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328371>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"ae567f647b"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,6772},
     {heap_size,6772},
     {stack_size,38},
     {reductions,25192},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,8},
     {messages,[{'DOWN',#Ref<0.3967453924.1259077639.154250>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.154251>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217188>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217189>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.155027>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.155028>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.105867>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.105868>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328386>]},
     {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
                  {dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"89487db213"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,242662},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1266155522.12100>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1266155522.12101>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.154847>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.154848>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328364>]},
     {dictionary,[{dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"6eeaf9de1b"},
                  {body_time,0},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {couch_rewrite_count,0}, 
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,147112},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1259077639.154716>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259077639.154717>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328374>]},
     {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
                  {dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"a8c1643078"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,2586},
     {heap_size,2586},
     {stack_size,10},
     {reductions,101521},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1272709121.80670>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'}, 
                        noproc},
                {'DOWN',#Ref<0.3967453924.1272709121.80671>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328340>]},
     {dictionary,[{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,
                                next => #Fun<rand.5.15449617>,type => exrop,
                                uniform => #Fun<rand.6.15449617>,
                                uniform_n => #Fun<rand.7.15449617>,weak_low_bits => 1}, 
                              [192787032645768316|82777130814936740]}},
                  {dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"5700ea3731"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,290815},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1260912645.28519>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1260912645.28520>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.105060>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.105061>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328372>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"6997339d4c"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,6772}, 
     {heap_size,6772},
     {stack_size,38},
     {reductions,38690},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,4},
     {messages,[{'DOWN',#Ref<0.3967453924.1261174788.117682>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}, 
                {'DOWN',#Ref<0.3967453924.1261174788.117683>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217131>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1259864070.217132>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328373>]},
     {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
                  {dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"fd552b2275"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,163990},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                                  1}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1263271939.104969>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.104970>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3326592>]},
     {dictionary,[{chttpd_stats,{st,0,0,0}},
                  {dont_log_request,true},
                  {nonce,"9af6f2a5ae"},
                  {mochiweb_request_recv,true},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {mochiweb_request_cookie,[{"AuthSession",
                                             "YWRtaW46NjBDODUwNjQ61MgmkVAqh88n6uJmT3BT6tBZ88I"}]},
                  {mochiweb_request_qs,[{"new_edits","false"}]},
                  {mp_att_writers,3},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,10958},
     {heap_size,10958},
     {stack_size,38},
     {reductions,47625},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1263271939.105412>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1263271939.105413>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3326418>]},
     {dictionary,[{dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"06c04da577"},
                  {body_time,0},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,112627},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,1},
     {messages,[{'DOWN',#Ref<0.3967453924.1258553352.77156>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3326607>]},
     {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
                  {dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"630bdcffe5"},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]}, 
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,2586},
     {heap_size,2586},
     {stack_size,10},
     {reductions,103263},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}],
    [{current_function,{mochiweb_http,request,3}},
     {initial_call,{proc_lib,init_p,5}},
     {status,waiting},
     {message_queue_len,2},
     {messages,[{'DOWN',#Ref<0.3967453924.1258029065.53021>,
                        process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc},
                {'DOWN',#Ref<0.3967453924.1258029065.53022>,process,
                        {stream,'couchdb@r4-couch01.internal.com'},
                        noproc}]},
     {links,[<0.1860.0>,#Port<0.3328391>]},
     {dictionary,[{dont_log_request,true},
                  {chttpd_stats,{st,1,0,0}},
                  {nonce,"d09a203770"},
                  {body_time,0},
                  {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
                  {'$initial_call',{mochiweb_acceptor,init,4}},
                  {couch_rewrite_count,0},
                  {dont_log_response,true}]},
     {trap_exit,false},
     {error_handler,error_handler},
     {priority,normal},
     {group_leader,<0.899.0>},
     {total_heap_size,1598},
     {heap_size,1598},
     {stack_size,10},
     {reductions,24075},
     {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                          {min_bin_vheap_size,46422},
                          {min_heap_size,233},
                          {fullsweep_after,65535},
                          {minor_gcs,0}]},
     {suspending,[]}]]
   ok
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] vladimirralev commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
vladimirralev commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-864076808


   Thanks for the response. I think this loop is a common backup strategy - just replicate all DBs to a backup server as fast as you can overnight or otherwise. Sometimes in parallel.
   
   That being said, this issue has been observed with a slow buildup of databases and the rate of creating the DBs in the example is probably not related to the root cause. I'll be trying to find the root cause of this and any hints are appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] skeyby commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
skeyby commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-873961109


   I experienced this problem as well on our clusters. After a lot of trial-and-errors I think the problem is related to the synchronization on the _dbs internal db across nodes. Maybe you can check a little with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
iilyak commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-873973954


   > That being said, this issue has been observed with a slow buildup of databases and the rate of creating the DBs in the example is probably not related to the root cause.
   
   It could be related to the rate of creation of databases. CouchDB uses LRU cache and keeps only limited number of databases in opened state. When you create databases rapidly you exceed the LRU cache size. Since most of the requests are new database creations these are definitely not in the cache.  When the LRU cache size is over the limit CouchDB starts closing databases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] vladimirralev commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
vladimirralev commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-874622769


   I think CouchDB 3 does eager indexing which causes a huge CPU/IO follow-up load asynchronously after a replication is complete for specific DBs. I have to pace the replications based on this, but that's fine and not related to the issue. I am setting up a new build here with more logging in between the lines, but so far it looks like indeed each DB is polling independently for health or at least logs something that is causing the sudden spike of queued messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] iilyak commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
iilyak commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-873973954


   > That being said, this issue has been observed with a slow buildup of databases and the rate of creating the DBs in the example is probably not related to the root cause.
   
   It could be related to the rate of creation of databases. CouchDB uses LRU cache and keeps only limited number of databases in opened state. When you create databases rapidly you exceed the LRU cache size. Since most of the requests are new database creations these are definitely not in the cache.  When the LRU cache size is over the limit CouchDB starts closing databases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] vladimirralev commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
vladimirralev commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-874622769


   I think CouchDB 3 does eager indexing which causes a huge CPU/IO follow-up load asynchronously after a replication is complete for specific DBs. I have to pace the replications based on this, but that's fine and not related to the issue. I am setting up a new build here with more logging in between the lines, but so far it looks like indeed each DB is polling independently for health or at least logs something that is causing the sudden spike of queued messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] janl commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
janl commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-863933342


   hey hey, just a quick note without going into too much detail. The test you are running (creating a lot a lot of databases in a tight loop) is not a use-case that CouchDB 3.x will be very happy with. I’m sure there are things we can improve, but this isn’t a use-case I see us optimising for a lot, unless someone contributes compelling PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] skeyby commented on issue #3630: Large number of DBs freeze the cluster when a node dies

Posted by GitBox <gi...@apache.org>.
skeyby commented on issue #3630:
URL: https://github.com/apache/couchdb/issues/3630#issuecomment-873961109


   I experienced this problem as well on our clusters. After a lot of trial-and-errors I think the problem is related to the synchronization on the _dbs internal db across nodes. Maybe you can check a little with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org