You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Kurt Mackey <ku...@arstechnica.com> on 2009/01/10 18:19:48 UTC

Considerations for lots and lots of databases?

I really, really like not having to deal with schema updates.  Given that I don't have to worry about this in quite the same way with Couch, I've been wondering what the downsides are to splitting data into multiple databases on a SaaS project.  The idea is that each account/customer would get their own database, rather than simply marking documents with an "owner".

Now, the obvious downside is that it makes it more difficult to do cross-customer queries.  But what other problems are there with this idea?  Assuming that there were lots and lots and lots of accounts, what performance implications are there to giving each their own DB rather than making them all share?

-Kurt

Re: Considerations for lots and lots of databases?

Posted by Jan Lehnardt <ja...@apache.org>.
On 10 Jan 2009, at 19:42, Robert Koberg wrote:

>
> On Jan 10, 2009, at 1:16 PM, Jan Lehnardt wrote:
>
>>
>> On 10 Jan 2009, at 19:07, Chris Anderson wrote:
>>
>>> On Sat, Jan 10, 2009 at 9:19 AM, Kurt Mackey  
>>> <ku...@arstechnica.com> wrote:
>>>> Now, the obvious downside is that it makes it more difficult to  
>>>> do cross-customer queries.  But what other problems are there  
>>>> with this idea?  Assuming that there were lots and lots and lots  
>>>> of accounts, what performance implications are there to giving  
>>>> each their own DB rather than making them all share?
>>>>
>>>
>>> CouchDB keeps each database in it's own file, so if you can spread  
>>> the
>>> files across disks (using symlinks for now) you should get better
>>> performance with many databases. DB-per-user is a good pattern also
>>> because it means you can let users replicate their entire account
>>> locally, without worrying about filtering out extra data.
>>>
>>
>> In addition, if you name your databases "foo/bar" CouchDB will  
>> actually
>> create that as a structure on disk to avoid running into trouble with
>> filesystems that don't like a lot of files in a single directory.
>
> Is this new? From:

It's an undocumented feature that has been in for a while :)

Cheers
Jan
--
>
>
> http://wiki.apache.org/couchdb/HTTP_database_API
>
> Specifically:
>
> "Note also that a / character in a DB name must be escaped when used  
> in a URL; if your DB is named his/her then it will be available at  
> [WWW] http://localhost:5984/his%2Fher. "
>
> I thought it would just create the directory 'his%2Fher'
>
> If new, do you still need to escape the slash in a URL?
>
> thanks,
> -Rob
>
>>
>>
>> Setups with 1 million databases representing users have been tested
>> successfully.
>>
>> Cheers
>> Jan
>> --
>
>


Re: Considerations for lots and lots of databases?

Posted by Antony Blakey <an...@gmail.com>.
On 11/01/2009, at 5:12 AM, Robert Koberg wrote:

> "Note also that a / character in a DB name must be escaped when used  
> in a URL; if your DB is named his/her then it will be available at  
> [WWW] http://localhost:5984/his%2Fher. "
>
> I thought it would just create the directory 'his%2Fher'
>
> If new, do you still need to escape the slash in a URL?

Couch will see the name "his/her" because the URL will be decoded. The  
%2F is purely a URL encoding issue to ensure that the dbname is a  
single URL path segment.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

75% of statistics are made up on the spot.



Re: Considerations for lots and lots of databases?

Posted by Robert Koberg <ro...@koberg.com>.
On Jan 10, 2009, at 1:16 PM, Jan Lehnardt wrote:

>
> On 10 Jan 2009, at 19:07, Chris Anderson wrote:
>
>> On Sat, Jan 10, 2009 at 9:19 AM, Kurt Mackey <ku...@arstechnica.com>  
>> wrote:
>>> Now, the obvious downside is that it makes it more difficult to do  
>>> cross-customer queries.  But what other problems are there with  
>>> this idea?  Assuming that there were lots and lots and lots of  
>>> accounts, what performance implications are there to giving each  
>>> their own DB rather than making them all share?
>>>
>>
>> CouchDB keeps each database in it's own file, so if you can spread  
>> the
>> files across disks (using symlinks for now) you should get better
>> performance with many databases. DB-per-user is a good pattern also
>> because it means you can let users replicate their entire account
>> locally, without worrying about filtering out extra data.
>>
>
> In addition, if you name your databases "foo/bar" CouchDB will  
> actually
> create that as a structure on disk to avoid running into trouble with
> filesystems that don't like a lot of files in a single directory.

Is this new? From:

http://wiki.apache.org/couchdb/HTTP_database_API

Specifically:

"Note also that a / character in a DB name must be escaped when used  
in a URL; if your DB is named his/her then it will be available at  
[WWW] http://localhost:5984/his%2Fher. "

I thought it would just create the directory 'his%2Fher'

If new, do you still need to escape the slash in a URL?

thanks,
-Rob

>
>
> Setups with 1 million databases representing users have been tested
> successfully.
>
> Cheers
> Jan
> --


Re: Considerations for lots and lots of databases?

Posted by Antony Blakey <an...@gmail.com>.
On 11/01/2009, at 5:53 AM, Dean Landolt wrote:

> On Sat, Jan 10, 2009 at 2:03 PM, Jan Lehnardt <ja...@apache.org> wrote:
>
>>
>> On 10 Jan 2009, at 19:41, Antony Blakey wrote:
>>
>>> In addition, if you name your databases "foo/bar" CouchDB will  
>>> actually
>>>> create that as a structure on disk to avoid running into trouble  
>>>> with
>>>> filesystems that don't like a lot of files in a single directory.
>>>>
>>>> Setups with 1 million databases representing users have been tested
>>>> successfully.
>>>>
>>>
>>> Hmmm. The filesystem layout changes that I've proofed, and had  
>>> accepted
>>> for incorporation, changes this. Filenames are munged to allow  
>>> arbitrary
>>> characters, and so '/' is escaped.
>>>
>>
>> Would it be possible to keep the current behaviour or deal with the
>> issue of lots of files in a single directory in another way?
>>
>> Cheers
>> Jan
>> --
>>
>
> I understand the need to escape everything, but would it be too  
> difficult to
> split on '/' and and escape each part? This is the first I'm  
> learning of
> this feature and I'm already missing it.

My concern is that this is an implementation leakage. And if applied  
uniformly (say to view names) it would get messy.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Only two things are infinite, the universe and human stupidity, and  
I'm not sure about the former.
  -- Albert Einstein


Re: Considerations for lots and lots of databases?

Posted by Dean Landolt <de...@deanlandolt.com>.
On Sat, Jan 10, 2009 at 2:03 PM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On 10 Jan 2009, at 19:41, Antony Blakey wrote:
>
>> In addition, if you name your databases "foo/bar" CouchDB will actually
>>> create that as a structure on disk to avoid running into trouble with
>>> filesystems that don't like a lot of files in a single directory.
>>>
>>> Setups with 1 million databases representing users have been tested
>>> successfully.
>>>
>>
>> Hmmm. The filesystem layout changes that I've proofed, and had accepted
>> for incorporation, changes this. Filenames are munged to allow arbitrary
>> characters, and so '/' is escaped.
>>
>
> Would it be possible to keep the current behaviour or deal with the
> issue of lots of files in a single directory in another way?
>
> Cheers
> Jan
> --
>

I understand the need to escape everything, but would it be too difficult to
split on '/' and and escape each part? This is the first I'm learning of
this feature and I'm already missing it.

Re: Considerations for lots and lots of databases?

Posted by Jan Lehnardt <ja...@apache.org>.
On 10 Jan 2009, at 19:41, Antony Blakey wrote:
>> In addition, if you name your databases "foo/bar" CouchDB will  
>> actually
>> create that as a structure on disk to avoid running into trouble with
>> filesystems that don't like a lot of files in a single directory.
>>
>> Setups with 1 million databases representing users have been tested
>> successfully.
>
> Hmmm. The filesystem layout changes that I've proofed, and had  
> accepted for incorporation, changes this. Filenames are munged to  
> allow arbitrary characters, and so '/' is escaped.

Would it be possible to keep the current behaviour or deal with the
issue of lots of files in a single directory in another way?

Cheers
Jan
--

Re: Considerations for lots and lots of databases?

Posted by Antony Blakey <an...@gmail.com>.
On 11/01/2009, at 4:46 AM, Jan Lehnardt wrote:

>
> On 10 Jan 2009, at 19:07, Chris Anderson wrote:
>
>> On Sat, Jan 10, 2009 at 9:19 AM, Kurt Mackey <ku...@arstechnica.com>  
>> wrote:
>>> Now, the obvious downside is that it makes it more difficult to do  
>>> cross-customer queries.  But what other problems are there with  
>>> this idea?  Assuming that there were lots and lots and lots of  
>>> accounts, what performance implications are there to giving each  
>>> their own DB rather than making them all share?
>>>
>>
>> CouchDB keeps each database in it's own file, so if you can spread  
>> the
>> files across disks (using symlinks for now) you should get better
>> performance with many databases. DB-per-user is a good pattern also
>> because it means you can let users replicate their entire account
>> locally, without worrying about filtering out extra data.
>>
>
> In addition, if you name your databases "foo/bar" CouchDB will  
> actually
> create that as a structure on disk to avoid running into trouble with
> filesystems that don't like a lot of files in a single directory.
>
> Setups with 1 million databases representing users have been tested
> successfully.

Hmmm. The filesystem layout changes that I've proofed, and had  
accepted for incorporation, changes this. Filenames are munged to  
allow arbitrary characters, and so '/' is escaped.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

Nothing is really work unless you would rather be doing something else.
   -- J. M. Barre



Re: Considerations for lots and lots of databases?

Posted by Jan Lehnardt <ja...@apache.org>.
On 10 Jan 2009, at 19:07, Chris Anderson wrote:

> On Sat, Jan 10, 2009 at 9:19 AM, Kurt Mackey <ku...@arstechnica.com>  
> wrote:
>> Now, the obvious downside is that it makes it more difficult to do  
>> cross-customer queries.  But what other problems are there with  
>> this idea?  Assuming that there were lots and lots and lots of  
>> accounts, what performance implications are there to giving each  
>> their own DB rather than making them all share?
>>
>
> CouchDB keeps each database in it's own file, so if you can spread the
> files across disks (using symlinks for now) you should get better
> performance with many databases. DB-per-user is a good pattern also
> because it means you can let users replicate their entire account
> locally, without worrying about filtering out extra data.
>

In addition, if you name your databases "foo/bar" CouchDB will actually
create that as a structure on disk to avoid running into trouble with
filesystems that don't like a lot of files in a single directory.

Setups with 1 million databases representing users have been tested
successfully.

Cheers
Jan
--

Re: Considerations for lots and lots of databases?

Posted by Chris Anderson <jc...@gmail.com>.
On Sat, Jan 10, 2009 at 9:19 AM, Kurt Mackey <ku...@arstechnica.com> wrote:
> Now, the obvious downside is that it makes it more difficult to do cross-customer queries.  But what other problems are there with this idea?  Assuming that there were lots and lots and lots of accounts, what performance implications are there to giving each their own DB rather than making them all share?
>

CouchDB keeps each database in it's own file, so if you can spread the
files across disks (using symlinks for now) you should get better
performance with many databases. DB-per-user is a good pattern also
because it means you can let users replicate their entire account
locally, without worrying about filtering out extra data.


-- 
Chris Anderson
http://jchris.mfdz.com

Re: Considerations for lots and lots of databases?

Posted by Kurt Mackey <ku...@arstechnica.com>.
Probably something Ruby based.  I'm currently poking at Rails + RelaxDB.


On 1/10/09 12:15 PM, "Flinn Mueller" <th...@gmail.com> wrote:

Kurt, what language/framework are you planning to use for the project?

On Jan 10, 2009, at 12:19 PM, Kurt Mackey wrote:

> I really, really like not having to deal with schema updates.  Given
> that I don't have to worry about this in quite the same way with
> Couch, I've been wondering what the downsides are to splitting data
> into multiple databases on a SaaS project.  The idea is that each
> account/customer would get their own database, rather than simply
> marking documents with an "owner".
>
> Now, the obvious downside is that it makes it more difficult to do
> cross-customer queries.  But what other problems are there with this
> idea?  Assuming that there were lots and lots and lots of accounts,
> what performance implications are there to giving each their own DB
> rather than making them all share?
>
> -Kurt



Re: Considerations for lots and lots of databases?

Posted by Flinn Mueller <th...@gmail.com>.
Kurt, what language/framework are you planning to use for the project?

On Jan 10, 2009, at 12:19 PM, Kurt Mackey wrote:

> I really, really like not having to deal with schema updates.  Given  
> that I don't have to worry about this in quite the same way with  
> Couch, I've been wondering what the downsides are to splitting data  
> into multiple databases on a SaaS project.  The idea is that each  
> account/customer would get their own database, rather than simply  
> marking documents with an "owner".
>
> Now, the obvious downside is that it makes it more difficult to do  
> cross-customer queries.  But what other problems are there with this  
> idea?  Assuming that there were lots and lots and lots of accounts,  
> what performance implications are there to giving each their own DB  
> rather than making them all share?
>
> -Kurt