You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Adam Kocoloski <ko...@apache.org> on 2016/07/07 16:01:48 UTC

Re: CouchDB and Kubernetes

Kubernetes 1.3 adds a new concept called “PetSets” (yes, as in “Cattle vs. Pets”) geared towards our use case. Documentation is here:

https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md <https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md>

Adam

> On May 3, 2016, at 6:09 PM, Adam Kocoloski <ko...@apache.org> wrote:
> 
> :)
> 
> 2.0 will maintain a list of which database shards are hosted on which cluster nodes in the _dbs database. The trouble is that there’s a 1:1 fixed correspondence between an Erlang node and a CouchDB cluster node; i.e. there’s no way to remap a CouchDB cluster node to a new Erlang node. In a world where an Erlang node is identified by an IP address controlled by the cloud service provider or container framework this results in a fairly brittle setup. If we allow for an abstract notion of a CouchDB cluster node that can be remapped to different Erlang nodes we can go far here.
> 
> As an aside, there are a ton of subtleties that we’ve uncovered over the years when it comes to relocating shard files around a cluster. These days CouchDB is smart enough to know when a file has been moved, what node it was moved from, and what fraction of the anti-entropy internal replication can be reused to sync up the file in its new location. Pretty interesting stuff, and it’ll certainly be something we need to keep in mind if we pursue the aforementioned work. Cheers,
> 
> Adam
> 
>> On May 3, 2016, at 5:17 PM, Michael Fair <mi...@daclubhouse.net> wrote:
>> 
>> I think separating the database id, be it a shard id or the entire db,
>> apart from the execution node/context where that database lives, so that
>> the databases themselves can be mobile (or even duplicated) across multiple
>> execution nodes makes perfect architectural sense to me.  Keeping a _peers
>> database which lists which databases are at which nodes makes sense me.
>> 
>> It seems like each "database" being its own thing separate and apart from
>> the node it executes on is a cleaner model all around.
>> 
>> Great idea!
>> 
>> Mike
>> On Apr 29, 2016 7:55 PM, "Adam Kocoloski" <kocolosk@apache.org <ma...@apache.org>> wrote:
>> 
>>> Hi all,
>>> 
>>> I’ve doing a bit of poking around the container orchestration space lately
>>> and looking at how we might best deploy a CouchDB 2.0 cluster in a
>>> container environment. In general I’ve been pretty impressed with the
>>> design point of the Kubernetes project, and I wanted to see how hard it
>>> would be to put together a proof of concept.
>>> 
>>> As a preamble, I needed to put together a container image for 2.0 that
>>> just runs a single Erlang VM instead of the container-local “dev cluster”.
>>> You can find that work here:
>>> 
>>> https://github.com/klaemo/docker-couchdb/pull/52 <
>>> https://github.com/klaemo/docker-couchdb/pull/52 <https://github.com/klaemo/docker-couchdb/pull/52>>
>>> 
>>> So far, so good - now for Kubernetes itself. My goal was to figure out how
>>> to deploy a collection of “Pods” that could discover one another and
>>> self-assemble into a cluster. Kubernetes differs from the traditional
>>> Docker network model in that every Pod gets an IP address that is routable
>>> from all other Pods in the cluster. As a result there’s no need for some of
>>> the port gymnastics that one might encounter with other Docker environments
>>> - each CouchDB pod can listen on 5984, 4369 and whatever distribution port
>>> you like on its own IP.
>>> 
>>> What you don’t get with Pods is a hostname that’s discoverable from other
>>> Pods in the cluster. A “Service” (a replicated, load-balanced collection of
>>> Pods) can optionally have a DNS name, but the Pods themselves do not. This
>>> throws a wrench in the most common distributed Erlang setup, where each
>>> node gets a name like “couchdb@FQDN” and the FQDNs are resolvable to IP
>>> addresses via DNS.
>>> 
>>> It is certainly possible to specify an Erlang node name like “
>>> couchdb@12.34.56.78 <ma...@12.34.56.78> <mailto:couchdb@12.34.56.78 <ma...@12.34.56.78>>”, but we need to be a
>>> bit careful here. CouchDB is currently forcing the Erlang node name to do
>>> “double-duty”; it’s both the way that the nodes in a cluster figure out how
>>> to route traffic to one another and it’s the identifier for nodes to claim
>>> ownership over individual replicas of database shards in the shard map.
>>> Speaking from experience it’s often quite useful operationally to remap a
>>> given Erlang node name to a new server and have the new server be
>>> automatically populated with the replicas it’s supposed to own. If we use
>>> the Pod IP in Kubernetes for the node name we won’t have that luxury.
>>> 
>>> I think the best path forward here would be to extend the “Node" concept
>>> in a CouchDB cluster so that it has an identifier which is allowed to be
>>> distinct fro the Erlang node name. The “CouchDB Node” is the one that owns
>>> database shard replicas, and it can be remapped to different distributed
>>> Erlang nodes over time via modification of an attribute in the _nodes DB.
>>> 
>>> Hope you all found this useful — I’m quite interested in finding way to
>>> make it easier for users to acquire a highly-available cluster configured
>>> in the “right way”, and I think projects like Kubernetes have a lot of
>>> promise in this regard. Cheers,
>>> 
>>> Adam
> 


Re: CouchDB and Kubernetes

Posted by Adam Kocoloski <ko...@apache.org>.
Hi all, I wanted to pick up this thread now that 2.0 is out the door. James Mackenzie has been working on an example that shows how to provision a CouchDB cluster as a Pet Set in Kubernetes using Clemens’ official 2.0.0 Docker image. We’ve been smoothing over the rough edges with the idea to submit it upstream to Kubernetes once it’s ready. James started a PR on his own repo so we could do that prep work:

https://github.com/James1912/kubernetes/pull/1 <https://github.com/James1912/kubernetes/pull/1>

Comments and ideas for simplification are very much welcome. Cheers,

Adam

> On Jul 7, 2016, at 12:01 PM, Adam Kocoloski <ko...@apache.org> wrote:
> 
> Kubernetes 1.3 adds a new concept called “PetSets” (yes, as in “Cattle vs. Pets”) geared towards our use case. Documentation is here:
> 
> https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md <https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md><https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md <https://github.com/kubernetes/kubernetes.github.io/blob/master/docs/user-guide/petset.md>>
> 
> Adam
> 
>> On May 3, 2016, at 6:09 PM, Adam Kocoloski <kocolosk@apache.org <ma...@apache.org>> wrote:
>> 
>> :)
>> 
>> 2.0 will maintain a list of which database shards are hosted on which cluster nodes in the _dbs database. The trouble is that there’s a 1:1 fixed correspondence between an Erlang node and a CouchDB cluster node; i.e. there’s no way to remap a CouchDB cluster node to a new Erlang node. In a world where an Erlang node is identified by an IP address controlled by the cloud service provider or container framework this results in a fairly brittle setup. If we allow for an abstract notion of a CouchDB cluster node that can be remapped to different Erlang nodes we can go far here.
>> 
>> As an aside, there are a ton of subtleties that we’ve uncovered over the years when it comes to relocating shard files around a cluster. These days CouchDB is smart enough to know when a file has been moved, what node it was moved from, and what fraction of the anti-entropy internal replication can be reused to sync up the file in its new location. Pretty interesting stuff, and it’ll certainly be something we need to keep in mind if we pursue the aforementioned work. Cheers,
>> 
>> Adam
>> 
>>> On May 3, 2016, at 5:17 PM, Michael Fair <mi...@daclubhouse.net> wrote:
>>> 
>>> I think separating the database id, be it a shard id or the entire db,
>>> apart from the execution node/context where that database lives, so that
>>> the databases themselves can be mobile (or even duplicated) across multiple
>>> execution nodes makes perfect architectural sense to me.  Keeping a _peers
>>> database which lists which databases are at which nodes makes sense me.
>>> 
>>> It seems like each "database" being its own thing separate and apart from
>>> the node it executes on is a cleaner model all around.
>>> 
>>> Great idea!
>>> 
>>> Mike
>>> On Apr 29, 2016 7:55 PM, "Adam Kocoloski" <kocolosk@apache.org <ma...@apache.org> <mailto:kocolosk@apache.org <ma...@apache.org>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I’ve doing a bit of poking around the container orchestration space lately
>>>> and looking at how we might best deploy a CouchDB 2.0 cluster in a
>>>> container environment. In general I’ve been pretty impressed with the
>>>> design point of the Kubernetes project, and I wanted to see how hard it
>>>> would be to put together a proof of concept.
>>>> 
>>>> As a preamble, I needed to put together a container image for 2.0 that
>>>> just runs a single Erlang VM instead of the container-local “dev cluster”.
>>>> You can find that work here:
>>>> 
>>>> https://github.com/klaemo/docker-couchdb/pull/52 <https://github.com/klaemo/docker-couchdb/pull/52> <
>>>> https://github.com/klaemo/docker-couchdb/pull/52 <https://github.com/klaemo/docker-couchdb/pull/52> <https://github.com/klaemo/docker-couchdb/pull/52 <https://github.com/klaemo/docker-couchdb/pull/52>>>
>>>> 
>>>> So far, so good - now for Kubernetes itself. My goal was to figure out how
>>>> to deploy a collection of “Pods” that could discover one another and
>>>> self-assemble into a cluster. Kubernetes differs from the traditional
>>>> Docker network model in that every Pod gets an IP address that is routable
>>>> from all other Pods in the cluster. As a result there’s no need for some of
>>>> the port gymnastics that one might encounter with other Docker environments
>>>> - each CouchDB pod can listen on 5984, 4369 and whatever distribution port
>>>> you like on its own IP.
>>>> 
>>>> What you don’t get with Pods is a hostname that’s discoverable from other
>>>> Pods in the cluster. A “Service” (a replicated, load-balanced collection of
>>>> Pods) can optionally have a DNS name, but the Pods themselves do not. This
>>>> throws a wrench in the most common distributed Erlang setup, where each
>>>> node gets a name like “couchdb@FQDN” and the FQDNs are resolvable to IP
>>>> addresses via DNS.
>>>> 
>>>> It is certainly possible to specify an Erlang node name like “
>>>> couchdb@12.34.56.78 <ma...@12.34.56.78> <mailto:couchdb@12.34.56.78 <ma...@12.34.56.78>> <mailto:couchdb@12.34.56.78 <ma...@12.34.56.78> <mailto:couchdb@12.34.56.78 <ma...@12.34.56.78>>>”, but we need to be a
>>>> bit careful here. CouchDB is currently forcing the Erlang node name to do
>>>> “double-duty”; it’s both the way that the nodes in a cluster figure out how
>>>> to route traffic to one another and it’s the identifier for nodes to claim
>>>> ownership over individual replicas of database shards in the shard map.
>>>> Speaking from experience it’s often quite useful operationally to remap a
>>>> given Erlang node name to a new server and have the new server be
>>>> automatically populated with the replicas it’s supposed to own. If we use
>>>> the Pod IP in Kubernetes for the node name we won’t have that luxury.
>>>> 
>>>> I think the best path forward here would be to extend the “Node" concept
>>>> in a CouchDB cluster so that it has an identifier which is allowed to be
>>>> distinct fro the Erlang node name. The “CouchDB Node” is the one that owns
>>>> database shard replicas, and it can be remapped to different distributed
>>>> Erlang nodes over time via modification of an attribute in the _nodes DB.
>>>> 
>>>> Hope you all found this useful — I’m quite interested in finding way to
>>>> make it easier for users to acquire a highly-available cluster configured
>>>> in the “right way”, and I think projects like Kubernetes have a lot of
>>>> promise in this regard. Cheers,
>>>> 
>>>> Adam