You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2017/01/07 18:25:59 UTC
Re: Shards per node

Hi Stefan,

there are all great questions :)

> On 27 Dec 2016, at 20:21, Stefan du Fresne <st...@medicmobile.org> wrote:
> 
> Hi,
> 
> I’ve been looking into CouchDB 2’s clustering works, and have a question about performance characteristics of sharding.
> 
> As I understand it, when you first configure a cluster you pick the number of shards. Each node has to have at least one shard on it, and each shard can be duplicated N times, where N best practice is considered 2-3x. How many shards you have is decided on DB creation time, and if you need more later (because maxnodes = shards * replicas) you need to replicate into a new cluster.

Default N 8 (your N here is q in CouchDB/Dynamo parlance: https://github.com/apache/couchdb/blob/master/rel/overlay/etc/default.ini#L40)


> I’m wondering if:
> - There is a recommended number of shards to use, or a recommended range to stay in

It varies a bit on the low end. E.g. I have a low-perf dual core server with spinning disks. And I’m running this with q=2, as I’m not getting any more IOPS out of the drives and CPU time from the dual cores for most of my operations. Also, this cluster will never have to grow, it is a single node q=2 setup.

On the high end, I’ll have to leave this to the Cloudant folks, as they have the most experience in there. With a modern SSD-backed server, I’d do at least one shard per CPU core, if not 2x or even 3x depending on amount of views.


> - If there is any known performance characteristics that map to how many shards you have. e.g., how differently would a one node “cluster” perform with 2 shards compared to 16 or even 256. Is there any harm in configuring your cluster with 16 shards say, even if you aren’t planning on having 16-32 nodes any time soon.

This sadly currently doesn’t exist, but if somebody is up for writing a little performance benchmark suite that we can run against a range of clusters from q=1 to q=256, I’d love to see it :)


> While replicating into a larger cluster is safe, we have a lot (thousands) of PouchDB clients who would have to re-download the entire changes feed to continue replicating. As these clients are on spotty / slow / expensive connections, so ideally we’d be able to get the number of shards right first time.

Good point. It’d be great if we had a better story for this. Maybe some clever hackery with host names and the cluster UUID can work? cc @rnewson @wohali @davisp @chewbranca


Best
Jan
-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/