You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Glenn Bech <gl...@gmail.com> on 2011/05/26 12:22:46 UTC

Limit on the number of databases?

Hi,

I just want to ask if there are limits on the number of databases in Couch.
I am playing around with embeded Couch on Android and are thinking in the
line of having
one database per user, and use replication to push data from the client to
the server. This will provde for an Excellent "offline" user experience.

This will of course not work if Couch does not handle unlimited datbases
very well performance- or otherwise.

Does this sound like a feasable design solution?

Regards,

Glenn

Re: Limit on the number of databases?

Posted by Marcos Ortiz <ml...@uci.cu>.
On 05/26/2011 05:52 AM, Glenn Bech wrote:
> Hi,
>
> I just want to ask if there are limits on the number of databases in Couch.
> I am playing around with embeded Couch on Android and are thinking in the
> line of having
> one database per user, and use replication to push data from the client to
> the server. This will provde for an Excellent "offline" user experience.
>
> This will of course not work if Couch does not handle unlimited datbases
> very well performance- or otherwise.
>
> Does this sound like a feasable design solution?
>
> Regards,
>
> Glenn
>
I think that you can think about your design.
What happens if your service tend to grow to millions of users?

You maybe think to have organized these users regionally for example, 
and on that way you can
have less documents entries in your databases.

For example:
- DB 1: All U.S users
- DB 2: All European Users
- DB 2: All Indian Users
- DB 4: All Chinese Users

Remember, It's a suggestion.

Regards

-- 
Marcos Luis Ortiz Valmaseda
  Software Engineer (Distributed Systems)
  http://uncubanitolinuxero.blogspot.com


Re: Limit on the number of databases?

Posted by Brian Mitchell <bi...@gmail.com>.

On Thursday, May 26, 2011 at 6:22 AM, Glenn Bech wrote:

> Hi,
> 
> I just want to ask if there are limits on the number of databases in Couch.
> I am playing around with embeded Couch on Android and are thinking in the
> line of having
> one database per user, and use replication to push data from the client to
> the server. This will provde for an Excellent "offline" user experience.
> 
> This will of course not work if Couch does not handle unlimited datbases
> very well performance- or otherwise.
> 
> Does this sound like a feasable design solution?
> 
> Regards,
> 
> Glenn
 I've done some testing and there are a couple things to keep in mind.

First of all, CouchDB relies directly on the scalability of your filesystem. Having one database in CouchDB means you also have at least one file for each of those. Since CouchDB currently stores them all in one directory, you'll need to make sure you select a filesystem that can handle your expected scale appropriately (many filesystems should be fine in the millions of files level, but characteristics can differ so do test this). 

Another problem, one which I don't have an immediate answer for is backup. While you could claim replication is enough for this, I'd say it isn't. The event you need backups for also cover events like maliciously destroyed or manipulated data or simply the existence of bugs. I'd rather not trust my data never get screwed up. by the code that accesses it. Many backup systems are designed around a small number of files. Being able to rollback to a point in time with millions of files could be an extremely painful process. (I have ideas on how to solve this but it's still not an easy problem.)

Last but not least, consider the number of active databases you'll need at any single time. This can be split across many machines of course but it still adds up quickly. Open file descriptors are great but not if you have to close and then reopen them all the time. A carefully tuned VM can manage many thousands w/o a problem but I wouldn't push this too much higher. So if you have 15 machines and 30k active users for any single 1 minute window, that would be 2k files open and active per machine.

Brian. 

Re: Limit on the number of databases?

Posted by Jayesh Thakrar <j_...@yahoo.com>.
I am very new to couchdb - but wondering if the approach below could work.

1. Have one or more independent clusters of couchdb (start with 1 and add more 
as needed).
2. Layout a DB naming scheme
3. Have an appropriate firewall/router/switch in front of the client 
machines/network and have that router redirect the connection/traffic to the 
appropriate server based on the URL of the REST request.

-- Jayesh




________________________________
From: Brian Mitchell <bi...@gmail.com>
To: user@couchdb.apache.org
Cc: glenn.bech@gmail.com
Sent: Thu, May 26, 2011 12:07:07 PM
Subject: Re: Limit on the number of databases?



On Thursday, May 26, 2011 at 11:15 AM, Sam Bisbee wrote:

>  - On a non-performance note, you can't do map/reduce across
> databases. If you plan on referencing between them or combining data,
> then you're probably going to have a index database that some client
> code puts its results into.
That or it's reasonable to build a data warehouse which does this in an 
aggregate database (via replication, possibly filtered). One benefit of having 
smaller databases is that view generation is cheaper if you want to avoid 
downtime and don't want to deal with stale views (not an option in many cases).

I've been investigating this too and will report on success if I achieve it, 
though at worst I'll just fall back to a larger database with BigCouch and a 
very large Q-value (shard count).

Brian.

Re: Limit on the number of databases?

Posted by Ajai Khattri <co...@bitblit.net>.
Since we're discussing large numbers of mobile clients: would it be 
possible to do replication in "batches" from a client (i.e. not 
immediately) so that server resources are not continuously tied up?

On another project Ive worked on where syncing from mobile clients was 
involved, we developed a scheme where the server informs the clients at 
the end of the sync process, which time they should next sync. It allowed 
us to stagger syncing of large numbers of clients across a 24hr period.


-- 
Aj.



Re: Limit on the number of databases?

Posted by Brian Mitchell <bi...@gmail.com>.

On Thursday, May 26, 2011 at 11:15 AM, Sam Bisbee wrote:

>  - On a non-performance note, you can't do map/reduce across
> databases. If you plan on referencing between them or combining data,
> then you're probably going to have a index database that some client
> code puts its results into.
That or it's reasonable to build a data warehouse which does this in an aggregate database (via replication, possibly filtered). One benefit of having smaller databases is that view generation is cheaper if you want to avoid downtime and don't want to deal with stale views (not an option in many cases).

I've been investigating this too and will report on success if I achieve it, though at worst I'll just fall back to a larger database with BigCouch and a very large Q-value (shard count).

Brian. 

Re: Limit on the number of databases?

Posted by Sam Bisbee <sa...@sbisbee.com>.
On Thu, May 26, 2011 at 6:22 AM, Glenn Bech <gl...@gmail.com> wrote:
> Hi,
>
> I just want to ask if there are limits on the number of databases in Couch.
> I am playing around with embeded Couch on Android and are thinking in the
> line of having
> one database per user, and use replication to push data from the client to
> the server. This will provde for an Excellent "offline" user experience.
>
> This will of course not work if Couch does not handle unlimited datbases
> very well performance- or otherwise.
>
> Does this sound like a feasable design solution?
>
> Regards,
>
> Glenn
>

I have been looking into this recently with an application design that
would use thousands of databases - tens to hundreds of thousands after
a year of usage. From what I have found there are a few considerations
to weigh...

  - Since each database is a file, you are going to need one file
descriptor per active database. CouchDB and erlang have their own
internal maximums that you can play with. Once their max is reached,
CouchDB starts to close the oldest file descriptors.

  - Since CouchDB closes file descriptors, if you have a bunch of
active databases you run the risk of slowing down your machine.
Increasing the number of open file descriptors is not always the best
solution, because you could start to see OS level performance issues.

  - Don't forget your OS's max open files. Take a look at ulimit or
pam if you're on a *nix machine.

  - On a non-performance note, you can't do map/reduce across
databases. If you plan on referencing between them or combining data,
then you're probably going to have a index database that some client
code puts its results into.

Cheers,

-- 
Sam Bisbee
www.sbisbee.com

Re: Limit on the number of databases?

Posted by Ajai Khattri <co...@bitblit.net>.
On Thu, 26 May 2011, Glenn Bech wrote:

> I just want to ask if there are limits on the number of databases in Couch.
> I am playing around with embeded Couch on Android and are thinking in the
> line of having
> one database per user, and use replication to push data from the client to
> the server. This will provde for an Excellent "offline" user experience.
> 
> This will of course not work if Couch does not handle unlimited datbases
> very well performance- or otherwise.
> 
> Does this sound like a feasable design solution?

As someone else already pointed out, there is the potential for lots of 
users, so it needs some more thought.

I am also working on something similar (Im an Android developer) and also 
thought about one database per user. But Im thinking it might be better to 
think about sharding from day one, so maybe having a web service that your 
app calls before setting up replication might be a better way to go. The 
web service could assign a specific server to each user so you could 
easily switch to a new server when you started hitting limits.



-- 
Aj.