You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Tiago Pereira (JIRA)" <ji...@apache.org> on 2017/03/23 10:17:41 UTC
[jira] [Created] (COUCHDB-3338) rexi_server timeout

Tiago Pereira created COUCHDB-3338:
--------------------------------------

             Summary: rexi_server timeout
                 Key: COUCHDB-3338
                 URL: https://issues.apache.org/jira/browse/COUCHDB-3338
             Project: CouchDB
          Issue Type: Bug
          Components: Database Core, JavaScript View Server, Replication
            Reporter: Tiago Pereira


Hi,

I've recently went to production with a CouchDB 2 instance, however I'm experiencing some severe issues that started appearing when I had an increase in usage that are causing my database to slow down, stop indexing / returning my views, and ultimately crash because it is consuming too much memory (Usually when the instance is fresh started it runs at ~300MBs RAM, and after a few hours it jumps to almost 3GB RAM usage, which is when it crashes because the system runs OOM).

I'm running my CouchDB instance on a kubernetes cluster hosted on Google Cloud, and I'm using klaemo/couchdb2 's latest 2.0.0 image as a base Docker image.

This instance currently has 792 active, continuous replications running, which I assume might be what is causing this slow down, since I tried disabling them and, after a reboot, the database appeared to be running fine without the replications.

When I consult the logs, I get a lot of these errors messages which I' assuming might be the culprit:
```
[error] 2017-03-23T09:55:06.804700Z nonode@nohost <0.14942.246> 6388d61064 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/00000000-1fffffff/db_name.1478875836">>,[{timeout,100},{user_ctx,{user_ctx,<<"replications">>,[<<"services_replicator">>,<<"b
udgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},100]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,86}]},{couch_db,open,2,[{file,"src/couch_db.erl"},{line,91}]},
{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,248}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

[error] 2017-03-23T09:55:06.834468Z nonode@nohost <0.28090.247> 6aa0e19144 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/20000000-3fffffff/db_name">>,[{timeout,200},{user_ctx,{user_ctx,<<"replications">>,[<<"services_repl
icator">>,<<"budgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},200]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,86}]},{couch_db,open,2,[{file,"src/couch_db.erl"
},{line,91}]},{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,248}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]

[error] 2017-03-23T09:55:06.834516Z nonode@nohost <0.1093.212> 1c317bf387 rexi_server exit:{timeout,{gen_server,call,[couch_server,{open,<<"shards/20000000-3fffffff/db_name">>,[{timeout,200},{user_ctx,{user_ctx,<<"replications">>,[<<"services_repli
cator">>,<<"budgets_replicator">>,<<"tasks_replicator">>,<<"comments_replicator">>],<<"default">>}}]},200]}} [{gen_server,call,3,[{file,"gen_server.erl"},{line,190}]},{couch_server,open,2,[{file,"src/couch_server.erl"},{line,86}]},{couch_db,open,2,[{file,"src/couch_db.erl"}
,{line,91}]},{fabric_rpc,open_shard,2,[{file,"src/fabric_rpc.erl"},{line,248}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,139}]}]
```

Could I have some help with this issue? I'm not sure if this might be an actual memory leak, or if, for the number of replications I currently have, I should be expected to actually have a node with more RAM in order to process all the live replications.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)