You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by KageDS <al...@kageds.com> on 2020/01/27 14:19:05 UTC

Cluster outage due to problem on one node

Hello list,

We recently had an outage on a CouchDB cluster where a problem on 1 node 
in the cluster prevented all writes from occurring and took out the 
application.

We are running a 6 node cluster, 3 nodes in dc1 and 3 in dc2, with the 
following settings:

[cluster]
placement = dc1:2,dc2:2
q=8
r=2
w=2
n=3

CouchDB version:

  curl -X GET http://127.0.0.1:5984
{"couchdb":"Welcome","version":"2.3.0","git_sha":"07ea0c7","uuid":"cb381f835e3922fbc183ce3eb05c1da8","features":["pluggable-storage-engines","scheduler"],"vendor":{"name":"The 
Apache Software Foundation"}}

Log files from 5 of the nodes show this error:

[error] 2020-01-23T17:58:30.640197Z couchdb@d1b1.sip.local 
<0.10369.1330> -------- fabric_worker_timeout 
open_revs,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/2b/94/20ca9a5e0d16ddccf18f4fa20e8b.1548248531">>
[error] 2020-01-23T17:58:30.641167Z couchdb@d1b1.sip.local 
<0.19108.1333> -------- fabric_worker_timeout 
open_doc,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/2b/94/20ca9a5e0d16ddccf18f4fa20e8b.1548248531">>
[error] 2020-01-23T17:58:33.035259Z couchdb@d1b1.sip.local <0.1810.1334> 
-------- fabric_worker_timeout 
open_revs,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/39/47/df16b2d43514a7d41dcf7f4c8d73.1548248538">>
[error] 2020-01-23T17:58:33.037218Z couchdb@d1b1.sip.local 
<0.14681.1330> -------- fabric_worker_timeout 
open_doc,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/39/47/df16b2d43514a7d41dcf7f4c8d73.1548248538">>
[error] 2020-01-23T17:58:50.634245Z couchdb@d1b1.sip.local 
<0.17849.1330> 3a094d698b fabric_worker_timeout 
open_doc,'couchdb@d2b3.sip.local',<<"shards/20000000-3fffffff/services.1548248909">>
[error] 2020-01-23T17:58:52.388263Z couchdb@d1b1.sip.local 
<0.19332.1329> 96611add99 fabric_worker_timeout 
open_doc,'couchdb@d2b3.sip.local',<<"shards/e0000000-ffffffff/accounts.1548248787">>

Which points to a problem on d2b3 and then log on this node is:


/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.275556Z couchdb@d2b3.sip.local <0.26325.1238> 
-------- rexi_server: from: couchdb@d2b1.sip.local(<14354.16318.1129>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.275362Z couchdb@d2b3.sip.local <0.18708.1236> 
-------- rexi_server: from: couchdb@d2b3.sip.local(<0.1032.1234>) mfa: 
fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.275584Z couchdb@d2b3.sip.local <0.31988.1233> 
-------- rexi_server: from: couchdb@d1b3.sip.local(<14353.32154.1338>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.275596Z couchdb@d2b3.sip.local <0.15622.1235> 
-------- rexi_server: from: couchdb@d1b2.sip.local(<14352.30846.1369>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.275741Z couchdb@d2b3.sip.local <0.8781.1235> 
-------- rexi_server: from: couchdb@d2b2.sip.local(<14355.25402.1152>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.276072Z couchdb@d2b3.sip.local <0.797.1235> -------- 
rexi_server: from: couchdb@d2b1.sip.local(<14354.28647.1129>) mfa: 
fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.276243Z couchdb@d2b3.sip.local <0.12311.1234> 
-------- rexi_server: from: couchdb@d1b3.sip.local(<14353.14446.1337>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.277333Z couchdb@d2b3.sip.local <0.23752.1233> 
-------- rexi_server: from: couchdb@d1b2.sip.local(<14352.13570.1369>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278143Z couchdb@d2b3.sip.local <0.6221.1234> 
-------- rexi_server: from: couchdb@d2b3.sip.local(<0.5950.1235>) mfa: 
fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278168Z couchdb@d2b3.sip.local <0.10161.1234> 
-------- rexi_server: from: couchdb@d2b2.sip.local(<14355.31878.1152>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278193Z couchdb@d2b3.sip.local <0.11752.1236> 
1f52cdfea8 rexi_server: from: couchdb@d1b1.sip.local(<14351.5309.1331>) 
mfa: fabric_rpc:map_view/5 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,148}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278016Z couchdb@d2b3.sip.local <0.12329.1234> 
1f52cdfea8 rexi_server: from: couchdb@d1b1.sip.local(<14351.5309.1331>) 
mfa: fabric_rpc:map_view/5 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,148}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278083Z couchdb@d2b3.sip.local <0.31002.1236> 
1f52cdfea8 rexi_server: from: couchdb@d1b1.sip.local(<14351.5309.1331>) 
mfa: fabric_rpc:map_view/5 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,148}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:29.278238Z couchdb@d2b3.sip.local <0.13296.1235> 
1f52cdfea8 rexi_server: from: couchdb@d1b1.sip.local(<14351.5309.1331>) 
mfa: fabric_rpc:map_view/5 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,map_view,5,[{file,"src/fabric_rpc.erl"},{line,148}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:35.180064Z couchdb@d2b3.sip.local <0.963.1234> -------- 
fabric_worker_timeout 
open_doc,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/2b/94/20ca9a5e0d16ddccf18f4fa20e8b.1548248531">>
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:35.191265Z couchdb@d2b3.sip.local <0.28963.1233> 
-------- fabric_worker_timeout 
open_revs,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/2b/94/20ca9a5e0d16ddccf18f4fa20e8b.1548248531">>
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:37.646983Z couchdb@d2b3.sip.local <0.18624.1233> 
-------- fabric_worker_timeout 
open_revs,'couchdb@d2b3.sip.local',<<"shards/40000000-5fffffff/account/39/47/df16b2d43514a7d41dcf7f4c8d73.1548248538">>
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.738966Z couchdb@d2b3.sip.local <0.2331.1233> 
-------- rexi_server: from: couchdb@d2b1.sip.local(<14354.29531.1130>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.739545Z couchdb@d2b3.sip.local <0.12476.1232> 
-------- rexi_server: from: couchdb@d1b1.sip.local(<14351.21208.1333>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.739759Z couchdb@d2b3.sip.local <0.22395.1235> 
-------- rexi_server: from: couchdb@d1b2.sip.local(<14352.13905.1370>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.740098Z couchdb@d2b3.sip.local <0.32082.1236> 
-------- rexi_server: from: couchdb@d1b2.sip.local(<14352.28036.1369>) 
mfa: fabric_rpc:open_doc/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.740428Z couchdb@d2b3.sip.local <0.24543.1234> 
-------- rexi_server: from: couchdb@d2b2.sip.local(<14355.20090.1150>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.741481Z couchdb@d2b3.sip.local <0.19440.1234> 
-------- rexi_server: from: couchdb@d1b3.sip.local(<14353.25323.1335>) 
mfa: fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.742429Z couchdb@d2b3.sip.local <0.2034.1236> 
-------- rexi_server: from: couchdb@d1b1.sip.local(<14351.16958.1327>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.742587Z couchdb@d2b3.sip.local <0.25660.1235> 
-------- rexi_server: from: couchdb@d1b2.sip.local(<14352.32337.1370>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.742765Z couchdb@d2b3.sip.local <0.20392.1236> 
-------- rexi_server: from: couchdb@d2b2.sip.local(<14355.2799.1152>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.742889Z couchdb@d2b3.sip.local <0.3251.1233> 
-------- rexi_server: from: couchdb@d2b1.sip.local(<14354.12701.1130>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.743525Z couchdb@d2b3.sip.local <0.15372.1236> 
-------- rexi_server: from: couchdb@d1b3.sip.local(<14353.16323.1335>) 
mfa: fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.778236Z couchdb@d2b3.sip.local <0.18751.1234> 
-------- rexi_server: from: couchdb@d2b3.sip.local(<0.15370.1235>) mfa: 
fabric_rpc:open_revs/4 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,331}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]
/var/log/couchdb/couchdb.log-20200124.gz:[error] 
2020-01-23T17:58:40.780347Z couchdb@d2b3.sip.local <0.10143.1237> 
-------- rexi_server: from: couchdb@d2b3.sip.local(<0.19312.1235>) mfa: 
fabric_rpc:update_docs/3 error:function_clause 
[{couch_db,incref,[undefined],[{file,"src/couch_db.erl"},{line,184}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,85}]},{fabric_rpc,read_repair_filter,4,[{file,"src/fabric_rpc.erl"},{line,349}]},{fabric_rpc,update_docs,3,[{file,"src/fabric_rpc.erl"},{line,274}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,140}]}]

I restarted CouchDB on d2b3 only and service was restored.

Can anyone suggest the cause of this problem or what I can do to debug 
the issue further?

Many Thanks

Alan