You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Mike Ray <ta...@me.com.INVALID> on 2018/09/28 11:46:45 UTC

Troubleshooting emfile, descriptors and shards

Env: 

CouchDB, NodeJS, Nano on Debian 8
~30 DBs with identical design docs, ranging from 5 docs to 3000

I’ve been working with CouchDB for a while in development, and now I’m scaling up for production I’m hitting issues.

Specifically, my shards and file descriptors seem to be spiralling out of control.

No doubt it’s something in my code rather than an issue with CouchDB per se, but I don’t see a way of finding out what’s going on. It’s not relative to the number of docs, but using prometheus I can see a couple of the DBs are hitting several thousand allocated file descriptors. I then hit ‘emfile’ errors (for which I’ve increased the available as per documentation, but I guess this is a temporary fix at best, as things grow). Additionally, shards grow to crazy sizes, and compaction does little to fix the problem. The only solution is to replicate the DB to a temp DB, then destroy and replicate back.

How can I debug this? And, is there any way of writing middleware for Nano during the ExpressJS request so I can see what’s going on?


Re: Troubleshooting emfile, descriptors and shards

Posted by Mike Ray <ta...@me.com.INVALID>.
There are 10 design docs on each DB, with between 1 and 12 views on each (most have either 1 or 2).

Good to know about the file descriptors, I’m less worried about that now :) What about max DBs - my model uses one DB per organisation on a muti-tenanted system, so I will go over 100 quite quickly in production. And there are operations run on each DB every clock minute, so the auto-closing might not help?

I should look at the q value, it’s 8 at the moment - what does this do exactly? I see in the settings it’s under cluster - I’m not clustering, at least not yet...

Re shards - not deleting much at the moment. Output on the largest DB below for info - does this look ok?

"update_seq": "7149-g1AAAAFreJzLYWBg4MhgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUoxJTIkyf___z8riYGBvQiPuiQFIJlkD1XK9AefUgeQ0niY0on4lCaAlNbDlD7AozSPBUgyNAApoOr5IOXMygSVL4Ao3w9WrklQ-QGI8vtgx5wnqPwBRDnE7Z-zALmgdDo",
"sizes": {
"file": 116946525,
"external": 4077076,
"active": 11025462
},
"purge_seq": 0,
"other": {
"data_size": 4077076
},
"doc_del_count": 580,
"doc_count": 3324,
"disk_size": 116946525,
"disk_format_version": 6,
"data_size": 11025462,
"compact_running": false,
"instance_start_time": "0"


From: Joan Touzet <wo...@apache.org>
Reply: user@couchdb.apache.org <us...@couchdb.apache.org>, wohali@apache.org <wo...@apache.org>
Date: 29 September 2018 at 03:35:30
To: user@couchdb.apache.org <us...@couchdb.apache.org>
Subject:  Re: Troubleshooting emfile, descriptors and shards  

How many views do you have per db? Remember that each view will use the  
same 'q' value as the database. A single q=8 database with ~30 active  
views will use  

1 db + 30 views = 31 * 8 shards = 248 shards per node  
248 * n=3 replicas = 744 shard replicas across 3 nodes  

There is absolutely nothing wrong with increasing your limit on file  
descriptors to a few hundred thousand - Couch can handle this.  

If your databases are that small, recommend changing your default 'q'  
value to 1, and this will cut your file handle number by 8. 30 views  
in a single database will result in 31 shards.  

As for the big shards...are you deleting a lot of documents? You might  
have a runaway problem with tombstones. `GET /{db}` will give you these  
statistics.  

-Joan  

----- Original Message -----  
> From: "Mike Ray" <ta...@me.com.INVALID>  
> To: user@couchdb.apache.org  
> Sent: Friday, September 28, 2018 7:46:45 AM  
> Subject: Troubleshooting emfile, descriptors and shards  
>  
> Env:  
>  
> CouchDB, NodeJS, Nano on Debian 8  
> ~30 DBs with identical design docs, ranging from 5 docs to 3000  
>  
> I’ve been working with CouchDB for a while in development, and now  
> I’m scaling up for production I’m hitting issues.  
>  
> Specifically, my shards and file descriptors seem to be spiralling  
> out of control.  
>  
> No doubt it’s something in my code rather than an issue with CouchDB  
> per se, but I don’t see a way of finding out what’s going on. It’s  
> not relative to the number of docs, but using prometheus I can see a  
> couple of the DBs are hitting several thousand allocated file  
> descriptors. I then hit ‘emfile’ errors (for which I’ve increased  
> the available as per documentation, but I guess this is a temporary  
> fix at best, as things grow). Additionally, shards grow to crazy  
> sizes, and compaction does little to fix the problem. The only  
> solution is to replicate the DB to a temp DB, then destroy and  
> replicate back.  
>  
> How can I debug this? And, is there any way of writing middleware for  
> Nano during the ExpressJS request so I can see what’s going on?  
>  
>  

Re: Troubleshooting emfile, descriptors and shards

Posted by Joan Touzet <wo...@apache.org>.
How many views do you have per db? Remember that each view will use the
same 'q' value as the database. A single q=8 database with ~30 active
views will use

1 db + 30 views = 31 * 8 shards = 248 shards per node
248 * n=3 replicas = 744 shard replicas across 3 nodes

There is absolutely nothing wrong with increasing your limit on file
descriptors to a few hundred thousand - Couch can handle this.

If your databases are that small, recommend changing your default 'q'
value to 1, and this will cut your file handle number by 8. 30 views
in a single database will result in 31 shards.

As for the big shards...are you deleting a lot of documents? You might
have a runaway problem with tombstones. `GET /{db}` will give you these
statistics.

-Joan

----- Original Message -----
> From: "Mike Ray" <ta...@me.com.INVALID>
> To: user@couchdb.apache.org
> Sent: Friday, September 28, 2018 7:46:45 AM
> Subject: Troubleshooting emfile, descriptors and shards
> 
> Env:
> 
> CouchDB, NodeJS, Nano on Debian 8
> ~30 DBs with identical design docs, ranging from 5 docs to 3000
> 
> I’ve been working with CouchDB for a while in development, and now
> I’m scaling up for production I’m hitting issues.
> 
> Specifically, my shards and file descriptors seem to be spiralling
> out of control.
> 
> No doubt it’s something in my code rather than an issue with CouchDB
> per se, but I don’t see a way of finding out what’s going on. It’s
> not relative to the number of docs, but using prometheus I can see a
> couple of the DBs are hitting several thousand allocated file
> descriptors. I then hit ‘emfile’ errors (for which I’ve increased
> the available as per documentation, but I guess this is a temporary
> fix at best, as things grow). Additionally, shards grow to crazy
> sizes, and compaction does little to fix the problem. The only
> solution is to replicate the DB to a temp DB, then destroy and
> replicate back.
> 
> How can I debug this? And, is there any way of writing middleware for
> Nano during the ExpressJS request so I can see what’s going on?
> 
>