You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Time Less <ti...@gmail.com> on 2010/02/24 19:08:40 UTC

Several newbie CouchDB questions.

I am looking at CouchDB as a possible database for a future feature in
Firefox. However, I'm having trouble locating a good document for
architecture and to answer questions about (for example) how node failure is
handled. Of course I also have questions about data storage, backups,
recovery, etc.

This writeup seems Quite Good:
http://damienkatz.net/2005/04/couchdb_archite.html but appears to've been
written five years ago. Are all those design decisions still relevent? Were
all those features actually implemented?

At the highest level: Suppose I want to set up a database cluster. Suppose I
want to distribute the data across many nodes (but not all of them, in order
to scale my writes). Do I need Lounge if I want CouchDB to help me solve
this problem? If so, is there a good document about how it deals with node
failure, node additions, node subtractions? Is this a proper forum for
asking questions about Lounge?

Thanks in advance for infos and pointers!

-- 
timeless(ness)

Re: Several newbie CouchDB questions.

Posted by Randall Leeds <ra...@gmail.com>.
Looks like #lounge on Freenode is uninhabited. I will be idling there from
now on. Please join me.

On Feb 24, 2010 1:49 PM, "Randall Leeds" <ra...@gmail.com> wrote:

Hey,

Glad to see more conversation like this popping up.

I've been working with Markus as well as a couple other people in the
community who have approached me individually with Lounge questions or
problems.

Markus is right: oversharding [M shards, N nodes | M > N] allows you to grow
and shrink the cluster fairly easily. Theoretically it would be fairly
straightforward to create a tree of Lounges if you need to grow beyond M
nodes. Supporting replication into/out of the Lounge would get us 90% of the
way toward making this easy. I think this concept is even discussed briefly
in the O'Reilly book.

In environments that receive regular writes an update-notifier script is all
that is needed to periodically refresh nodes that have missed updates (due
to downtime, etc). Lounge relies on this for consistency since it is
currently designed to write to any replica of a shard.

I'm fine to continue discussing Lounge here, but if consensus is that we
should have a separate list please recommend a solution.

I don't believe that the CouchDB wiki is the right place for documentation,
but there should be a clustering faq or something which links to any
available projects (currently only Lounge is public, I believe).

I would absolutely love to get the documentation sorted out and clean up
what's left of the googlecode -> github migration. I'll try to do that this
weekend.

I'm always happy to see community interest. Get in touch if you'd like to
help with anything and let me know if I can answer your questions.

Randall


>
> On Feb 24, 2010 12:11 PM, "Markus Jelsma" <ma...@buyways.nl> wrote:
>
> Actually, on some leve...

>
>
>
> Time Less said:
> >
> > I've looked over what little there is, and it appears to me Lounge
> doesn't d...

Re: Several newbie CouchDB questions.

Posted by Randall Leeds <ra...@gmail.com>.
Hey,

Glad to see more conversation like this popping up.

I've been working with Markus as well as a couple other people in the
community who have approached me individually with Lounge questions or
problems.

Markus is right: oversharding [M shards, N nodes | M > N] allows you to grow
and shrink the cluster fairly easily. Theoretically it would be fairly
straightforward to create a tree of Lounges if you need to grow beyond M
nodes. Supporting replication into/out of the Lounge would get us 90% of the
way toward making this easy. I think this concept is even discussed briefly
in the O'Reilly book.

In environments that receive regular writes an update-notifier script is all
that is needed to periodically refresh nodes that have missed updates (due
to downtime, etc). Lounge relies on this for consistency since it is
currently designed to write to any replica of a shard.

I'm fine to continue discussing Lounge here, but if consensus is that we
should have a separate list please recommend a solution.

I don't believe that the CouchDB wiki is the right place for documentation,
but there should be a clustering faq or something which links to any
available projects (currently only Lounge is public, I believe).

I would absolutely love to get the documentation sorted out and clean up
what's left of the googlecode -> github migration. I'll try to do that this
weekend.

I'm always happy to see community interest. Get in touch if you'd like to
help with anything and let me know if I can answer your questions.

Randall

On Feb 24, 2010 12:11 PM, "Markus Jelsma" <ma...@buyways.nl> wrote:

Actually, on some level it does deal with node failure and cluster changes.

Failures are being handled gracefully. Once you have decent sharded
cluster installed, you can actually shut nodes down (as if it's a failure)
and keep it running. I have a test setup with 4 virtual machines, each
running dumb- and smartproxy. It has been sharded on a level that allows
me to shut half the cluster down while pulling data from it using Siege or
ApacheBench; everthing just goes a bit slower. The only thing you need to
keep in mind is that the node you use for access (in my case all 4 grant
access) isn't down; but that can be remedied.

The only thing that can fail during reads is a view that needs to
aggregate data from the nodes. The total resultset can be smaller then
anticipated is a node fails during that process. The final resultset won't
be corrupted though.

Pushing data to the cluster while, for instance, one node is down is a bit
more complicated because you really need to replicate the changes made to
the sharded databases back to the dead node. This must be done manually
before it joins the cluster again. Anyway, it would be a nice feature if
the cluster can repopulate a dead node automatically if it goes up again.

Dealing with cluster changes is a challenge. Adding more nodes to the
cluster is quite easy but reducing is very complicated because it was
already sharded. At this moment, you would need to pull the data from the
cluster, reconfigure the shardmap to fit a reduced cluster, and populate
it again. But beware, growing the cluster will be a tough job if you
haven't given it enough thought up front. By oversharding the cluster,
growth can be accomodated easily - it's just a matter of pointing shards
to another node and copying those sharded databases to the new node. Well,
it isn't that easy but shouldn't give you a headache.

Although we aren't using it in production, we will someday. Perhaps the
lounge developers and production users can say something about their
experience and feature requests.



Time Less said:
>
> I've looked over what little there is, and it appears to me Lounge
> doesn't d...

Re: Several newbie CouchDB questions.

Posted by Markus Jelsma <ma...@buyways.nl>.
Actually, on some level it does deal with node failure and cluster changes.

Failures are being handled gracefully. Once you have decent sharded
cluster installed, you can actually shut nodes down (as if it's a failure)
and keep it running. I have a test setup with 4 virtual machines, each
running dumb- and smartproxy. It has been sharded on a level that allows
me to shut half the cluster down while pulling data from it using Siege or
ApacheBench; everthing just goes a bit slower. The only thing you need to
keep in mind is that the node you use for access (in my case all 4 grant
access) isn't down; but that can be remedied.

The only thing that can fail during reads is a view that needs to
aggregate data from the nodes. The total resultset can be smaller then
anticipated is a node fails during that process. The final resultset won't
be corrupted though.

Pushing data to the cluster while, for instance, one node is down is a bit
more complicated because you really need to replicate the changes made to
the sharded databases back to the dead node. This must be done manually
before it joins the cluster again. Anyway, it would be a nice feature if
the cluster can repopulate a dead node automatically if it goes up again.

Dealing with cluster changes is a challenge. Adding more nodes to the
cluster is quite easy but reducing is very complicated because it was
already sharded. At this moment, you would need to pull the data from the
cluster, reconfigure the shardmap to fit a reduced cluster, and populate
it again. But beware, growing the cluster will be a tough job if you
haven't given it enough thought up front. By oversharding the cluster,
growth can be accomodated easily - it's just a matter of pointing shards
to another node and copying those sharded databases to the new node. Well,
it isn't that easy but shouldn't give you a headache.

Although we aren't using it in production, we will someday. Perhaps the
lounge developers and production users can say something about their
experience and feature requests.


Time Less said:
>
> I've looked over what little there is, and it appears to me Lounge
> doesn't deal with node failure or cluster size changes (ie:
> adding/subtracting nodes in the cluster). It looks like it's merely two
> components for distributing reads/writes and giving some map/reduce
> functionality.
>
> --
> timeless(ness)




Re: Several newbie CouchDB questions.

Posted by Time Less <ti...@gmail.com>.
> > At the highest level: Suppose I want to set up a database cluster.
> > Suppose I want to distribute the data across many nodes....
>
> At this moment you would need lounge indeed. Unfortunately, there isn't
> really good documentation on how it actually works and why it works like
> that. There is, however, documentation on how to install the several
> components.
>

I've looked over what little there is, and it appears to me Lounge doesn't
deal with node failure or cluster size changes (ie: adding/subtracting nodes
in the cluster). It looks like it's merely two components for distributing
reads/writes and giving some map/reduce functionality.

-- 
timeless(ness)

Re: Several newbie CouchDB questions.

Posted by Markus Jelsma <ma...@buyways.nl>.
Time Less said:
> I am looking at CouchDB as a possible database for a future feature in
> Firefox. However, I'm having trouble locating a good document for
> architecture and to answer questions about (for example) how node
> failure is handled. Of course I also have questions about data storage,
> backups, recovery, etc.
>
> This writeup seems Quite Good:
> http://damienkatz.net/2005/04/couchdb_archite.html but appears to've
> been written five years ago. Are all those design decisions still
> relevent? Were all those features actually implemented?
>
> At the highest level: Suppose I want to set up a database cluster.
> Suppose I want to distribute the data across many nodes (but not all of
> them, in order to scale my writes). Do I need Lounge if I want CouchDB
> to help me solve this problem? If so, is there a good document about how
> it deals with node failure, node additions, node subtractions? Is this a
> proper forum for asking questions about Lounge?

At this moment you would need lounge indeed. Unfortunately, there isn't
really good documentation on how it actually works and why it works like
that. There is, however, documentation on how to install the several
components.

At this moment, documentation is still spread between the github and
google code. Perhaps now would be a good moment to invest some additional
time and add documentation which i'd prefer to put on CouchDB's own wiki
instead, but i'm unsure if the contributers will be very happy with that.

At least, it works quite well and you can ask about it on this
mailinglist, there is no other public place to discuss at this moment.

>
> Thanks in advance for infos and pointers!
>
> --
> timeless(ness)