You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Marcus <co...@wordit.com> on 2020/02/09 16:02:11 UTC

Maximum number of databases?

How many databases can be used without causing issues with replication and server performance?

I found two very different opinions. The pouchdb blog quotes 100K (based on a discussion about Cloudant in 2014). However a Cloudant blog series from March 2019 recommends a maximum of 500.

Can anyone explain the huge difference? I understand it's going to depend on use cases, but a difference of 90,500 databases is significant. 

500 are too few when databases are needed for read access control using roles. One for each user's personal document locker, one for public data (web), and one for a private group. That leaves about 160 users.

Here are two excerpts from that Cloudant blog series of March 2019.

"Rule 4: Fewer databases are better than many

If you can, limit the number of databases per Cloudant account to 500 or fewer. While there is nothing magical about this particular number (Cloudant can safely handle more), there are several use cases that are adversely affected by large numbers of databases in an account."

"Rule 5: Avoid the “database per user” anti-pattern like the plague
If you’re building out a multi-user service on top of Cloudant, it is tempting to let each user store their data in a separate database under the application account. That works well, mostly, if the number of users is small."

Source: https://www.ibm.com/cloud/blog/cloudant-best-and-worst-practices-part-1

What are your personal experiences with large numbers of databases?

Marcus



Re: Maximum number of databases?

Posted by Sinan Gabel <si...@gmail.com>.
You should probably make a test that fits your perceived number of
databases but otherwise I would not speculate too much about what the
maximum is (I do not believe there is any built-in maximum number of
databases), if it works you're done.

A configuration you may want to experiment with is the number of shards if
you set up a cluster, the "q". Possibly reduce to one i.e. "q = 1".

https://docs.couchdb.org/en/stable/config/cluster.html

On Sun, 9 Feb 2020 at 17:02, Marcus <co...@wordit.com> wrote:

> How many databases can be used without causing issues with replication and
> server performance?
>
> I found two very different opinions. The pouchdb blog quotes 100K (based
> on a discussion about Cloudant in 2014). However a Cloudant blog series
> from March 2019 recommends a maximum of 500.
>
> Can anyone explain the huge difference? I understand it's going to depend
> on use cases, but a difference of 90,500 databases is significant.
>
> 500 are too few when databases are needed for read access control using
> roles. One for each user's personal document locker, one for public data
> (web), and one for a private group. That leaves about 160 users.
>
> Here are two excerpts from that Cloudant blog series of March 2019.
>
> "Rule 4: Fewer databases are better than many
>
> If you can, limit the number of databases per Cloudant account to 500 or
> fewer. While there is nothing magical about this particular number
> (Cloudant can safely handle more), there are several use cases that are
> adversely affected by large numbers of databases in an account."
>
> "Rule 5: Avoid the “database per user” anti-pattern like the plague
> If you’re building out a multi-user service on top of Cloudant, it is
> tempting to let each user store their data in a separate database under the
> application account. That works well, mostly, if the number of users is
> small."
>
> Source:
> https://www.ibm.com/cloud/blog/cloudant-best-and-worst-practices-part-1
>
> What are your personal experiences with large numbers of databases?
>
> Marcus
>
>
>

Re: Maximum number of databases?

Posted by Stefan Klein <st...@gmail.com>.
Hi,

Am So., 9. Feb. 2020 um 17:02 Uhr schrieb Marcus <co...@wordit.com>:
>
> "Rule 5: Avoid the “database per user” anti-pattern like the plague
> If you’re building out a multi-user service on top of Cloudant, it is tempting to let each user store their data in a separate database under the application account. That works well, mostly, if the number of users is small."
>
> Source: https://www.ibm.com/cloud/blog/cloudant-best-and-worst-practices-part-1

I think the important part is:

"Now add the need to derive cross-user analytics. The way you do that
is to replicate all the user databases into a single analytics DB. All
good. Now, this app suddenly becomes successful, and the number of
users grow from 150 to 20,000. Now we have 20,000 replications just to
keep the analytics DB current. If we also want to run in an
active-active DR setup, we add another 20,000 replications and
basically the system will stop functioning."

> What are your personal experiences with large numbers of databases?

We do have a large number of databases, the per-user approach, self hosted.
BUT we do not have any continuous replications running to sync the
databases with an analytics database.

From my understanding a database _not_ in use is just some files lying
around in the filesystem.
So I do not think it makes sense to talk about "maximum number of
databases" but to talk about "maximum number of _active_ databases"
and "maximum number of concurrent replications".

With the newish scheduling replicator¹ even a large number of
replications should not be much of an issue, since they are no longer
concurrent. Still the quote from "rule 4" applies:

"The replicator scheduler has a limited number of simultaneous
replication jobs it is prepared to run. That means that as the number
of databases grows, the replication latency is likely to increase if
you try to replicate everything contained in an account."

Please take this with a grain of salt, I haven't played around with
the scheduling replicator yet, since we have a working system where we
do one shot replications based on application knowledge so far less
than "number of active users" replications are even triggered.

[1]: http://docs.couchdb.org/en/master/replication/replicator.html

-- 
Stefan

Re: Maximum number of databases?

Posted by ermouth <er...@gmail.com>.
In terms of file system, a DB inside Couch is, basically, several files.
Number of files depends on number of the DB shards stored at particular
node, and also depends on the number of design documents with indices for
the DB.

Opening a file is costly for OS – so having a lot of sharded DBs is bad in
terms of opening/closing a lot of files.

Concurrent writing into many files is generally bad practice for spindle
devices, it may be also not so optimal for some SSDs. Also, since CouchDB
uses ‘append only’ strategy inserting/updating data, having consistent
inbound feed multiplexed into several DBs may cause excessive storage
fragmentation.

So – do test.

Also please read this at SO: https://bit.ly/2tJIJnN

ermouth

Re: Maximum number of databases?

Posted by Willem van der Westhuizen <wi...@kwantu.net>.
I was wondering if the Cloudant recommendation was based on the Cloudant 
superstructure, or the underlying couchdb architecture. And particularly 
how important the issue of continuous replication on each of those is in 
the assessment. Here is our use case:

Each user has their own user database, which is mirrored on the local 
pouchdb client. (in the browser, electron offline, apk offline). We have 
an "online" mode, in which data objects are read directly from couchdb 
(shared database - we do not use per user databases for access control, 
but for improved performance over poor networks). It saves any document 
in a local db cache for working. In our use case, which is a business 
process management reporting tool, there are always a number of 
documents in the packet to be processed. It is important that all the 
documents save correctly, or none at all. Therefore, when the user does 
the final submit, all the documents are processed to the user's local 
copy of the User database, not the shared one. From here the list of 
documents are packaged into a transaction object (which can be quite 
large) and replicated to the users' local copy on the server. From here, 
the transaction manager picks up the new document, processes it and 
saves it back into the shared database as part of the transaction process.

Because we use a one-way packet driven replication, triggered by a save 
event and not a continuous replication, this limits the performance 
issues (so we believe) as long as the transaction manager can process 
all the incoming documents effectively. And that can be scaled up 
without too much difficulty.

I would be interested to hear if there is a reason that we should be 
concerned?

Willem

On 2020/02/09 18:02, Marcus wrote:
> How many databases can be used without causing issues with replication and server performance?
>
> I found two very different opinions. The pouchdb blog quotes 100K (based on a discussion about Cloudant in 2014). However a Cloudant blog series from March 2019 recommends a maximum of 500.
>
> Can anyone explain the huge difference? I understand it's going to depend on use cases, but a difference of 90,500 databases is significant.
>
> 500 are too few when databases are needed for read access control using roles. One for each user's personal document locker, one for public data (web), and one for a private group. That leaves about 160 users.
>
> Here are two excerpts from that Cloudant blog series of March 2019.
>
> "Rule 4: Fewer databases are better than many
>
> If you can, limit the number of databases per Cloudant account to 500 or fewer. While there is nothing magical about this particular number (Cloudant can safely handle more), there are several use cases that are adversely affected by large numbers of databases in an account."
>
> "Rule 5: Avoid the “database per user” anti-pattern like the plague
> If you’re building out a multi-user service on top of Cloudant, it is tempting to let each user store their data in a separate database under the application account. That works well, mostly, if the number of users is small."
>
> Source: https://www.ibm.com/cloud/blog/cloudant-best-and-worst-practices-part-1
>
> What are your personal experiences with large numbers of databases?
>
> Marcus
>
>