You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Nick Vatamaniuc <va...@apache.org> on 2019/11/19 17:10:57 UTC

[DISCUSS] Node types in CouchDB 4.x

Hi everyone,

I'd like to discuss the ability to have heterogeneous nodes types in CouchDB 4.

In CouchDB 2 and 3 the nodes in the cluster are usually similar, and
functionality is uniformly distributed amongst the nodes. That is all
nodes can accept HTTP requests, run replication jobs, build indices
etc. They are typically deployed such that they similar hardware
requirements.

In an FDB-based CouchDB 4, CRUD operations, on the Erlang nodes,
wouldn't require as many resources, so it would be possible to have a
set of nodes, performing just CRUD operations that are much smaller
than the equivalent CouchDB 2 and 3 nodes. However, indexing and
replication might still require heavy resource usage.

So the proposal is to add configuration to CouchDB 4 to allow some
nodes to perform only  a subset of their current functionality. For
example, it would be possible to have 6 1-CPU nodes with 512MB
accepting API requests, and, 2 4-CPU node with 4GB of memory each
running replication and indexing jobs only, or any other such
combinations. By default, with any extra configuration, the behavior
would stay as is today -- all nodes will run all the functionality.

I created an RFC exploring how it might look like:
https://github.com/apache/couchdb-documentation/pull/457

There is a comment there how it could be implemented. So far it looks
like it could be fairly trivial  since it would build on the
couch_jobs work already in place.

What does everyone think?

Cheers,
-Nick

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Nick Vatamaniuc <va...@gmail.com>.

> I assume that should have been "without any extra configuration"?

Doh! Good find. It is a typo.

On Tue, Nov 19, 2019 at 12:17 PM Paul Davis <pa...@gmail.com> wrote:
>
> Sounds reasonable assuming you made a typo here:
>
> > By default, with any extra configuration, the behavior would stay as is today...
>
> I assume that should have been "without any extra configuration"?
>
> On Tue, Nov 19, 2019 at 11:11 AM Nick Vatamaniuc <va...@apache.org> wrote:
> >
> > Hi everyone,
> >
> > I'd like to discuss the ability to have heterogeneous nodes types in CouchDB 4.
> >
> > In CouchDB 2 and 3 the nodes in the cluster are usually similar, and
> > functionality is uniformly distributed amongst the nodes. That is all
> > nodes can accept HTTP requests, run replication jobs, build indices
> > etc. They are typically deployed such that they similar hardware
> > requirements.
> >
> > In an FDB-based CouchDB 4, CRUD operations, on the Erlang nodes,
> > wouldn't require as many resources, so it would be possible to have a
> > set of nodes, performing just CRUD operations that are much smaller
> > than the equivalent CouchDB 2 and 3 nodes. However, indexing and
> > replication might still require heavy resource usage.
> >
> > So the proposal is to add configuration to CouchDB 4 to allow some
> > nodes to perform only  a subset of their current functionality. For
> > example, it would be possible to have 6 1-CPU nodes with 512MB
> > accepting API requests, and, 2 4-CPU node with 4GB of memory each
> > running replication and indexing jobs only, or any other such
> > combinations. By default, with any extra configuration, the behavior
> > would stay as is today -- all nodes will run all the functionality.
> >
> > I created an RFC exploring how it might look like:
> > https://github.com/apache/couchdb-documentation/pull/457
> >
> > There is a comment there how it could be implemented. So far it looks
> > like it could be fairly trivial  since it would build on the
> > couch_jobs work already in place.
> >
> > What does everyone think?
> >
> > Cheers,
> > -Nick

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Paul Davis <pa...@gmail.com>.

Sounds reasonable assuming you made a typo here:

> By default, with any extra configuration, the behavior would stay as is today...

I assume that should have been "without any extra configuration"?

On Tue, Nov 19, 2019 at 11:11 AM Nick Vatamaniuc <va...@apache.org> wrote:
>
> Hi everyone,
>
> I'd like to discuss the ability to have heterogeneous nodes types in CouchDB 4.
>
> In CouchDB 2 and 3 the nodes in the cluster are usually similar, and
> functionality is uniformly distributed amongst the nodes. That is all
> nodes can accept HTTP requests, run replication jobs, build indices
> etc. They are typically deployed such that they similar hardware
> requirements.
>
> In an FDB-based CouchDB 4, CRUD operations, on the Erlang nodes,
> wouldn't require as many resources, so it would be possible to have a
> set of nodes, performing just CRUD operations that are much smaller
> than the equivalent CouchDB 2 and 3 nodes. However, indexing and
> replication might still require heavy resource usage.
>
> So the proposal is to add configuration to CouchDB 4 to allow some
> nodes to perform only  a subset of their current functionality. For
> example, it would be possible to have 6 1-CPU nodes with 512MB
> accepting API requests, and, 2 4-CPU node with 4GB of memory each
> running replication and indexing jobs only, or any other such
> combinations. By default, with any extra configuration, the behavior
> would stay as is today -- all nodes will run all the functionality.
>
> I created an RFC exploring how it might look like:
> https://github.com/apache/couchdb-documentation/pull/457
>
> There is a comment there how it could be implemented. So far it looks
> like it could be fairly trivial  since it would build on the
> couch_jobs work already in place.
>
> What does everyone think?
>
> Cheers,
> -Nick

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Nick Vatamaniuc <va...@gmail.com>.

Oh I remember that discussion now. I even commented on it :-). I'll have to
add a reference to it in the RFC.

Looking through the comments, it seems there would be some work involved in
backporting it to 3.x -- storage nodes becoming non-storage nodes, and some
redesign around how_replicator docs are monitored and updated...

For 4.x, most of it becomes trivial given how couch_jobs' API is already
split into a frontend and a backend part. There it's just a matter of
wrapping the startup of those parts in case statements to check if they
should be enabled or disabled.

I made a quick attempt at implementing it to see what it might look like in
a draft pr:

https://github.com/apache/couchdb/pull/2319

There are few changes needed to be made, like switching to using
application environment vars, as per Jan's suggestion, but I think it
capture the general idea for the 4.x implementation.

Cheers,
-Nick

On Tue, Nov 19, 2019 at 7:01 PM Adam Kocoloski <ko...@apache.org> wrote:

> I’ve long been a fan of this sort of thing — see
> https://github.com/apache/couchdb/issues/1338for an example ☺️
>
> Many of the node types are not really specific to the 4.0 architecture. I
> don’t know if anyone will be interested in backporting to the classic
> architecture, but API nodes, replicator nodes, etc. would definitely be
> relevant in 3.x.
>
> Adam
>
> > On Nov 19, 2019, at 11:54 AM, Nick Vatamaniuc <va...@gmail.com>
> wrote:
> >
> > 
> >> Isn’t it an approach used by Couchbase?
> > https://docs.couchbase.com/server/current/clustersetup/services-mds.html
> >
> > Thanks for the link. From taking a quick look over it, I think it is?
> > Though I am not familiar with exact terminology like "index" vs
> > "query" service in that context.
>

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Adam Kocoloski <ko...@apache.org>.

I’ve long been a fan of this sort of thing — see https://github.com/apache/couchdb/issues/1338for an example ☺️

Many of the node types are not really specific to the 4.0 architecture. I don’t know if anyone will be interested in backporting to the classic architecture, but API nodes, replicator nodes, etc. would definitely be relevant in 3.x.

Adam

> On Nov 19, 2019, at 11:54 AM, Nick Vatamaniuc <va...@gmail.com> wrote:
> 
> 
>> Isn’t it an approach used by Couchbase?
> https://docs.couchbase.com/server/current/clustersetup/services-mds.html
> 
> Thanks for the link. From taking a quick look over it, I think it is?
> Though I am not familiar with exact terminology like "index" vs
> "query" service in that context.

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Nick Vatamaniuc <va...@gmail.com>.

Makes sense, thanks for explaining, ermouth!

On Wed, Nov 20, 2019 at 10:44 AM ermouth <er...@gmail.com> wrote:
>
> > with exact terminology like "index" vs
> > "query" service
>
> ‘Index service’ holds index trees, which may require different sharding
> than docs. Index service is mostly about IO, partitioning pre-calculated
> data records.
>
> ‘Query service’ runs queries, which may involve fetching index, complex
> sub-querying, filtering, grouping, etc. Something like view + _list call in
> CouchDB, often CPU-bound, so sharding is more about load balancing than
> data partitioning.
>
> ermouth
>
>
> вт, 19 нояб. 2019 г. в 22:53, Nick Vatamaniuc <va...@gmail.com>:
>
> > > Isn’t it an approach used by Couchbase?
> > https://docs.couchbase.com/server/current/clustersetup/services-mds.html
> >
> > Thanks for the link. From taking a quick look over it, I think it is?
> > Though I am not familiar with exact terminology like "index" vs
> > "query" service in that context.
> >

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by ermouth <er...@gmail.com>.

> with exact terminology like "index" vs
> "query" service

‘Index service’ holds index trees, which may require different sharding
than docs. Index service is mostly about IO, partitioning pre-calculated
data records.

‘Query service’ runs queries, which may involve fetching index, complex
sub-querying, filtering, grouping, etc. Something like view + _list call in
CouchDB, often CPU-bound, so sharding is more about load balancing than
data partitioning.

ermouth


вт, 19 нояб. 2019 г. в 22:53, Nick Vatamaniuc <va...@gmail.com>:

> > Isn’t it an approach used by Couchbase?
> https://docs.couchbase.com/server/current/clustersetup/services-mds.html
>
> Thanks for the link. From taking a quick look over it, I think it is?
> Though I am not familiar with exact terminology like "index" vs
> "query" service in that context.
>

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by Nick Vatamaniuc <va...@gmail.com>.

> Isn’t it an approach used by Couchbase?
https://docs.couchbase.com/server/current/clustersetup/services-mds.html

Thanks for the link. From taking a quick look over it, I think it is?
Though I am not familiar with exact terminology like "index" vs
"query" service in that context.

Re: [DISCUSS] Node types in CouchDB 4.x

Posted by ermouth <er...@gmail.com>.

Isn’t it an approach used by Couchbase?
https://docs.couchbase.com/server/current/clustersetup/services-mds.html

ermouth


вт, 19 нояб. 2019 г. в 20:11, Nick Vatamaniuc <va...@apache.org>:

> Hi everyone,
>
> I'd like to discuss the ability to have heterogeneous nodes types in
> CouchDB 4.
>
> In CouchDB 2 and 3 the nodes in the cluster are usually similar, and
> functionality is uniformly distributed amongst the nodes. That is all
> nodes can accept HTTP requests, run replication jobs, build indices
> etc. They are typically deployed such that they similar hardware
> requirements.
>
> In an FDB-based CouchDB 4, CRUD operations, on the Erlang nodes,
> wouldn't require as many resources, so it would be possible to have a
> set of nodes, performing just CRUD operations that are much smaller
> than the equivalent CouchDB 2 and 3 nodes. However, indexing and
> replication might still require heavy resource usage.
>
> So the proposal is to add configuration to CouchDB 4 to allow some
> nodes to perform only  a subset of their current functionality. For
> example, it would be possible to have 6 1-CPU nodes with 512MB
> accepting API requests, and, 2 4-CPU node with 4GB of memory each
> running replication and indexing jobs only, or any other such
> combinations. By default, with any extra configuration, the behavior
> would stay as is today -- all nodes will run all the functionality.
>
> I created an RFC exploring how it might look like:
> https://github.com/apache/couchdb-documentation/pull/457
>
> There is a comment there how it could be implemented. So far it looks
> like it could be fairly trivial  since it would build on the
> couch_jobs work already in place.
>
> What does everyone think?
>
> Cheers,
> -Nick
>