You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Benoit Chesneau <bc...@gmail.com> on 2012/12/13 17:06:59 UTC

database design question: concurrent writes

Hi all,


This morning I was back reading a lot of fundamentals about  databases and
such and was asking myself how we could increase the number of concurrent
writes.

These days the theory is that it will be solved by sharding the databases
in multiples database files and merging results of the queries. Since the
databases will be shareded then the writes on the same db will be
concurrents. A map of the shards willl be kept aside. All of this thanks to
the introduction of bigcouch.

The question I have is why don't we already do that? Ie balancing datas on
different files on one db? for example the db folder could be

database/XY.couch

where XY are the first letters of an id or content hash or any consistent
hashing method.

I am currently asking myself such question because I am wondering how will
the backup works when couchdb will be used as a single node. How to backup
only one db without having to query for the mapping and such? How to keep
it it simple.

Related to that why did bigcouch used that design? Why mapping shards in a
db database instead of having some kind of natural balancing on the fs and
having a consistent hashing algorithm used to balance on different
machines/vms as well ?


- benoît

Re: database design question: concurrent writes

Posted by Hans J Schroeder <hs...@cloudno.de>.
On Dec 13, 2012, at 6:41 PM, Robert Newson <rn...@apache.org> wrote:

> Two databases can have a different number of shards, different numbers
> of replicas of each shard, different locations for those shards, we
> might move shards over time as we add or remove nodes from the
> cluster. I don't see how you can do any of that without a document
> describing the layout.
> 
> B.
> 
> On 13 December 2012 17:30, Benoit Chesneau <bc...@gmail.com> wrote:
>> On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <rn...@apache.org> wrote:
>> 
>>> Views are also sharded.
>>> 
>>> It's common for a node to host multiple shards of the same database,
>>> so we already have this 'concurrent writes' notion, if I've
>>> interpreted it correctly.
>>> 
>>> 
>> Well my question are more related why using a mapping ? And how to keep the
>> backup of one database easy for a user. Possibly without relying on a
>> mapping stored aside.
>> 
>> Other question is  why bigcouch choose that design vs the one I propose.
>> 
>> @Hans since db are shared and views are done / db ,views indexations are
>> also concurrent.
>> 
>> - benoît
>> 
>> 
>> 
>>> On 13 December 2012 16:36, Hans J Schroeder <hs...@cloudno.de> wrote:
>>>> 
>>>> On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bc...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> 
>>>>> This morning I was back reading a lot of fundamentals about  databases
>>> and
>>>>> such and was asking myself how we could increase the number of
>>> concurrent
>>>>> writes.
>>>>> 
>>>>> These days the theory is that it will be solved by sharding the
>>> databases
>>>>> in multiples database files and merging results of the queries. Since
>>> the
>>>>> databases will be shareded then the writes on the same db will be
>>>>> concurrents. A map of the shards willl be kept aside. All of this
>>> thanks to
>>>>> the introduction of bigcouch.
>>>>> 
>>>>> The question I have is why don't we already do that? Ie balancing datas
>>> on
>>>>> different files on one db? for example the db folder could be
>>>>> 
>>>>> database/XY.couch
>>>>> 
>>>>> where XY are the first letters of an id or content hash or any
>>> consistent
>>>>> hashing method.
>>>>> 
>>>>> I am currently asking myself such question because I am wondering how
>>> will
>>>>> the backup works when couchdb will be used as a single node. How to
>>> backup
>>>>> only one db without having to query for the mapping and such? How to
>>> keep
>>>>> it it simple.
>>>>> 
>>>>> Related to that why did bigcouch used that design? Why mapping shards
>>> in a
>>>>> db database instead of having some kind of natural balancing on the fs
>>> and
>>>>> having a consistent hashing algorithm used to balance on different
>>>>> machines/vms as well ?
>>>>> 
>>>>> 
>>>>> - benoît
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> That's like "horizontal partitioning" in conventional databases and I
>>> think its a great idea. Having a writer process for each partition will
>>> make it scale.
>>>> 
>>>> Does Bigcouch have anything for the view files too or are they just
>>> sharding the backing files?
>>>> 
>>>> - Hans
>>> 

Thanks for the clarification about the views. 

Its all about what we want to have. For concurrent writes, a simple shuffling like Benoit has described, would be an efficient solution. For configurable clusters, a mapping store of some kind is needed.

- Hans

Re: database design question: concurrent writes

Posted by Randall Leeds <ra...@gmail.com>.
On Thu, Dec 13, 2012 at 9:41 AM, Robert Newson <rn...@apache.org> wrote:

> Two databases can have a different number of shards, different numbers
> of replicas of each shard, different locations for those shards, we
> might move shards over time as we add or remove nodes from the
> cluster. I don't see how you can do any of that without a document
> describing the layout.
>

tl;dr there's a mapping because the administrator can change it.

Do LSM trees have any concurrency in their use of multiple files, or are
they accessed consecutively during writes?

Re: database design question: concurrent writes

Posted by Robert Newson <rn...@apache.org>.
Two databases can have a different number of shards, different numbers
of replicas of each shard, different locations for those shards, we
might move shards over time as we add or remove nodes from the
cluster. I don't see how you can do any of that without a document
describing the layout.

B.

On 13 December 2012 17:30, Benoit Chesneau <bc...@gmail.com> wrote:
> On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <rn...@apache.org> wrote:
>
>> Views are also sharded.
>>
>> It's common for a node to host multiple shards of the same database,
>> so we already have this 'concurrent writes' notion, if I've
>> interpreted it correctly.
>>
>>
> Well my question are more related why using a mapping ? And how to keep the
> backup of one database easy for a user. Possibly without relying on a
> mapping stored aside.
>
> Other question is  why bigcouch choose that design vs the one I propose.
>
> @Hans since db are shared and views are done / db ,views indexations are
> also concurrent.
>
> - benoît
>
>
>
>> On 13 December 2012 16:36, Hans J Schroeder <hs...@cloudno.de> wrote:
>> >
>> > On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bc...@gmail.com>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >>
>> >> This morning I was back reading a lot of fundamentals about  databases
>> and
>> >> such and was asking myself how we could increase the number of
>> concurrent
>> >> writes.
>> >>
>> >> These days the theory is that it will be solved by sharding the
>> databases
>> >> in multiples database files and merging results of the queries. Since
>> the
>> >> databases will be shareded then the writes on the same db will be
>> >> concurrents. A map of the shards willl be kept aside. All of this
>> thanks to
>> >> the introduction of bigcouch.
>> >>
>> >> The question I have is why don't we already do that? Ie balancing datas
>> on
>> >> different files on one db? for example the db folder could be
>> >>
>> >> database/XY.couch
>> >>
>> >> where XY are the first letters of an id or content hash or any
>> consistent
>> >> hashing method.
>> >>
>> >> I am currently asking myself such question because I am wondering how
>> will
>> >> the backup works when couchdb will be used as a single node. How to
>> backup
>> >> only one db without having to query for the mapping and such? How to
>> keep
>> >> it it simple.
>> >>
>> >> Related to that why did bigcouch used that design? Why mapping shards
>> in a
>> >> db database instead of having some kind of natural balancing on the fs
>> and
>> >> having a consistent hashing algorithm used to balance on different
>> >> machines/vms as well ?
>> >>
>> >>
>> >> - benoît
>> >
>> >
>> > Hi,
>> >
>> > That's like "horizontal partitioning" in conventional databases and I
>> think its a great idea. Having a writer process for each partition will
>> make it scale.
>> >
>> > Does Bigcouch have anything for the view files too or are they just
>> sharding the backing files?
>> >
>> > - Hans
>>

Re: database design question: concurrent writes

Posted by Benoit Chesneau <bc...@gmail.com>.
On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <rn...@apache.org> wrote:

> Views are also sharded.
>
> It's common for a node to host multiple shards of the same database,
> so we already have this 'concurrent writes' notion, if I've
> interpreted it correctly.
>
>
Well my question are more related why using a mapping ? And how to keep the
backup of one database easy for a user. Possibly without relying on a
mapping stored aside.

Other question is  why bigcouch choose that design vs the one I propose.

@Hans since db are shared and views are done / db ,views indexations are
also concurrent.

- benoît



> On 13 December 2012 16:36, Hans J Schroeder <hs...@cloudno.de> wrote:
> >
> > On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bc...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >>
> >> This morning I was back reading a lot of fundamentals about  databases
> and
> >> such and was asking myself how we could increase the number of
> concurrent
> >> writes.
> >>
> >> These days the theory is that it will be solved by sharding the
> databases
> >> in multiples database files and merging results of the queries. Since
> the
> >> databases will be shareded then the writes on the same db will be
> >> concurrents. A map of the shards willl be kept aside. All of this
> thanks to
> >> the introduction of bigcouch.
> >>
> >> The question I have is why don't we already do that? Ie balancing datas
> on
> >> different files on one db? for example the db folder could be
> >>
> >> database/XY.couch
> >>
> >> where XY are the first letters of an id or content hash or any
> consistent
> >> hashing method.
> >>
> >> I am currently asking myself such question because I am wondering how
> will
> >> the backup works when couchdb will be used as a single node. How to
> backup
> >> only one db without having to query for the mapping and such? How to
> keep
> >> it it simple.
> >>
> >> Related to that why did bigcouch used that design? Why mapping shards
> in a
> >> db database instead of having some kind of natural balancing on the fs
> and
> >> having a consistent hashing algorithm used to balance on different
> >> machines/vms as well ?
> >>
> >>
> >> - benoît
> >
> >
> > Hi,
> >
> > That's like "horizontal partitioning" in conventional databases and I
> think its a great idea. Having a writer process for each partition will
> make it scale.
> >
> > Does Bigcouch have anything for the view files too or are they just
> sharding the backing files?
> >
> > - Hans
>

Re: database design question: concurrent writes

Posted by Robert Newson <rn...@apache.org>.
Views are also sharded.

It's common for a node to host multiple shards of the same database,
so we already have this 'concurrent writes' notion, if I've
interpreted it correctly.

On 13 December 2012 16:36, Hans J Schroeder <hs...@cloudno.de> wrote:
>
> On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bc...@gmail.com> wrote:
>
>> Hi all,
>>
>>
>> This morning I was back reading a lot of fundamentals about  databases and
>> such and was asking myself how we could increase the number of concurrent
>> writes.
>>
>> These days the theory is that it will be solved by sharding the databases
>> in multiples database files and merging results of the queries. Since the
>> databases will be shareded then the writes on the same db will be
>> concurrents. A map of the shards willl be kept aside. All of this thanks to
>> the introduction of bigcouch.
>>
>> The question I have is why don't we already do that? Ie balancing datas on
>> different files on one db? for example the db folder could be
>>
>> database/XY.couch
>>
>> where XY are the first letters of an id or content hash or any consistent
>> hashing method.
>>
>> I am currently asking myself such question because I am wondering how will
>> the backup works when couchdb will be used as a single node. How to backup
>> only one db without having to query for the mapping and such? How to keep
>> it it simple.
>>
>> Related to that why did bigcouch used that design? Why mapping shards in a
>> db database instead of having some kind of natural balancing on the fs and
>> having a consistent hashing algorithm used to balance on different
>> machines/vms as well ?
>>
>>
>> - benoît
>
>
> Hi,
>
> That's like "horizontal partitioning" in conventional databases and I think its a great idea. Having a writer process for each partition will make it scale.
>
> Does Bigcouch have anything for the view files too or are they just sharding the backing files?
>
> - Hans

Re: database design question: concurrent writes

Posted by Hans J Schroeder <hs...@cloudno.de>.
On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bc...@gmail.com> wrote:

> Hi all,
> 
> 
> This morning I was back reading a lot of fundamentals about  databases and
> such and was asking myself how we could increase the number of concurrent
> writes.
> 
> These days the theory is that it will be solved by sharding the databases
> in multiples database files and merging results of the queries. Since the
> databases will be shareded then the writes on the same db will be
> concurrents. A map of the shards willl be kept aside. All of this thanks to
> the introduction of bigcouch.
> 
> The question I have is why don't we already do that? Ie balancing datas on
> different files on one db? for example the db folder could be
> 
> database/XY.couch
> 
> where XY are the first letters of an id or content hash or any consistent
> hashing method.
> 
> I am currently asking myself such question because I am wondering how will
> the backup works when couchdb will be used as a single node. How to backup
> only one db without having to query for the mapping and such? How to keep
> it it simple.
> 
> Related to that why did bigcouch used that design? Why mapping shards in a
> db database instead of having some kind of natural balancing on the fs and
> having a consistent hashing algorithm used to balance on different
> machines/vms as well ?
> 
> 
> - benoît


Hi, 

That's like "horizontal partitioning" in conventional databases and I think its a great idea. Having a writer process for each partition will make it scale.

Does Bigcouch have anything for the view files too or are they just sharding the backing files?

- Hans