You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Benoit Chesneau <bc...@gmail.com> on 2014/07/04 11:34:59 UTC

question: mix between the sharding level and couch_* applications

Hi all,

While doing a review on the bigcouch branch (for the rcouch merge) I
found that the sharding level is mixed with the current applications
and I wonder if we could skip some of it.

The one that could probably easily be done is the usage of chttpd in
couch_mrview:

src/couch_mrview/src/couch_mrview_http.erl
src/couch_mrview/src/couch_mrview_show.erl

Why not having these changes directly in  chttpd. I could port the in
the couch_httpd application (which contains the HTTP standalone api)
in rcouch. It would also ease that merge.

Others I am not sure, but  it may be implemented differently. Or at
least can someone can provides the detail for each? Why they are
needed there?

mem3 is used in couch, couch_index, couch_mrview and couch_replicator:

src/couch/src/couch_changes.erl
src/couch/src/couch_db.erl
src/couch/src/couch_db_updater.erl
src/couch/src/couch_server.erl
src/couch/src/couch_util.erl
src/couch_index/src/couch_index_server.erl
src/couch_mrview/src/couch_mrview.erl
src/couch_replicator/src/couch_replicator.app.src
src/couch_replicator/src/couch_replicator_manager.erl

fabric is the other one:

src/couch/src/couch_changes.erl
src/couch/src/couch_db.erl
src/couch/src/couch_db_updater.erl
src/couch_index/src/couch_index_server.erl
src/couch_mrview/src/couch_mrview_show.erl


I am not sure that fabric and mem3 need to be there. I didn't test it
yet but do they prevent the standalone usage of couch ? Ie. can we
prevent completely the sharding level?

Just some questions. I am really interested in them. I am asking that
in the view of the rcouch merge. Also I think it's interesting to have
a clear understanding of the new architecture and how things need to
be articulated all together. It would also help anyone that want to
ship internally custom versions of the apache couchdb. Maybe people
from cloudant and some others already thought about that?



- benoit

Re: question: mix between the sharding level and couch_* applications

Posted by Robert Samuel Newson <rn...@apache.org>.
HI,

These are great questions, thank you.

Yes, single node couchdb still works, that’s a critical goal for the merge. Port 5986 points directly to the couch_http*.erl modules we all know. Where they might make calls to fabric/mem3 they do so conditionally. If any couchdb 1.6 API is missing or broken on port 5986, then that’s a regression that needs fixing before the release.

As for "preventing completely the sharding level" there are two answers. Initially, for the merge and the 2.0 release, yes, absolutely, we should include the couch_http* modules. Going forward, it’s important to remove that. A single node installation of couchdb 2.0 should run through the full codebase (chttpd to fabric to rexi, calling mem3 as appropriate) so that there are no surprises when moving to multiple nodes.

As you’ve seen, the core of couchdb has been extracted into couchdb-couch.git. What I hope to see after the 2.0 merge is the further whittling of that repo down to its essentials (which would be a similar thing that you’ve done in cowdb). Perhaps couchdb-couch becomes multiple smaller repositories, extracting the {couch_file, couch_btree, couch_db, couch_db_updater} heart of couchdb.

I can’t speak to what work is required to add clustering to new rcouch features but I hope I’ve not made it harder to merge them into single-node (couchdb-couch.git) couch.

To take one example, the couch_replicator_manager has custom work for this merge specifically to allow it work correctly for clustered and non-clustered _replicator databases (indeed, both at once). It does not call fabric (the original bigcouch version did) and uses mem3 to determine local ownership of shards (but only for clustered databases).

Summary, we (Cloudant as a whole, and myself as primary actor) are taking great pains to ensure unclustered couchdb works for this release, for at least the reason that it helps everyone verify that we’ve forgotten nothing in this really quite complicated code merging process. All the existing tests should still pass, for what that’s worth.

B.

On 4 Jul 2014, at 10:34, Benoit Chesneau <bc...@gmail.com> wrote:

> Hi all,
> 
> While doing a review on the bigcouch branch (for the rcouch merge) I
> found that the sharding level is mixed with the current applications
> and I wonder if we could skip some of it.
> 
> The one that could probably easily be done is the usage of chttpd in
> couch_mrview:
> 
> src/couch_mrview/src/couch_mrview_http.erl
> src/couch_mrview/src/couch_mrview_show.erl
> 
> Why not having these changes directly in  chttpd. I could port the in
> the couch_httpd application (which contains the HTTP standalone api)
> in rcouch. It would also ease that merge.
> 
> Others I am not sure, but  it may be implemented differently. Or at
> least can someone can provides the detail for each? Why they are
> needed there?
> 
> mem3 is used in couch, couch_index, couch_mrview and couch_replicator:
> 
> src/couch/src/couch_changes.erl
> src/couch/src/couch_db.erl
> src/couch/src/couch_db_updater.erl
> src/couch/src/couch_server.erl
> src/couch/src/couch_util.erl
> src/couch_index/src/couch_index_server.erl
> src/couch_mrview/src/couch_mrview.erl
> src/couch_replicator/src/couch_replicator.app.src
> src/couch_replicator/src/couch_replicator_manager.erl
> 
> fabric is the other one:
> 
> src/couch/src/couch_changes.erl
> src/couch/src/couch_db.erl
> src/couch/src/couch_db_updater.erl
> src/couch_index/src/couch_index_server.erl
> src/couch_mrview/src/couch_mrview_show.erl
> 
> 
> I am not sure that fabric and mem3 need to be there. I didn't test it
> yet but do they prevent the standalone usage of couch ? Ie. can we
> prevent completely the sharding level?
> 
> Just some questions. I am really interested in them. I am asking that
> in the view of the rcouch merge. Also I think it's interesting to have
> a clear understanding of the new architecture and how things need to
> be articulated all together. It would also help anyone that want to
> ship internally custom versions of the apache couchdb. Maybe people
> from cloudant and some others already thought about that?
> 
> 
> 
> - benoit


Re: question: mix between the sharding level and couch_* applications

Posted by Paul Davis <pa...@gmail.com>.
On Fri, Jul 4, 2014 at 4:34 AM, Benoit Chesneau <bc...@gmail.com> wrote:
> Hi all,
>
> While doing a review on the bigcouch branch (for the rcouch merge) I
> found that the sharding level is mixed with the current applications
> and I wonder if we could skip some of it.
>
> The one that could probably easily be done is the usage of chttpd in
> couch_mrview:
>
> src/couch_mrview/src/couch_mrview_http.erl
> src/couch_mrview/src/couch_mrview_show.erl
>
> Why not having these changes directly in  chttpd. I could port the in
> the couch_httpd application (which contains the HTTP standalone api)
> in rcouch. It would also ease that merge.
>

These are referencing chttpd because that's where we implemented that
delayed response code. I think the answer in this case is to move the
chttpd function definitions to couch_httpd and then have chttpd proxy
through to them as it does for most of its library functions of this
nature.

> Others I am not sure, but  it may be implemented differently. Or at
> least can someone can provides the detail for each? Why they are
> needed there?
>
> mem3 is used in couch, couch_index, couch_mrview and couch_replicator:
>
> src/couch/src/couch_changes.erl
> src/couch/src/couch_db.erl
> src/couch/src/couch_db_updater.erl
> src/couch/src/couch_server.erl
> src/couch/src/couch_util.erl
> src/couch_index/src/couch_index_server.erl
> src/couch_mrview/src/couch_mrview.erl
> src/couch_replicator/src/couch_replicator.app.src
> src/couch_replicator/src/couch_replicator_manager.erl
>
> fabric is the other one:
>
> src/couch/src/couch_changes.erl
> src/couch/src/couch_db.erl
> src/couch/src/couch_db_updater.erl
> src/couch_index/src/couch_index_server.erl
> src/couch_mrview/src/couch_mrview_show.erl
>
>
> I am not sure that fabric and mem3 need to be there. I didn't test it
> yet but do they prevent the standalone usage of couch ? Ie. can we
> prevent completely the sharding level?
>

Most of this is that CouchDB is not 100% designed to work in a
cluster. For instance, everywhere that there's an assumption that it
can just grab all design documents we need to insert shims so that
it'll be a global call instead of trying to just check a local file. I
definitely consider each of these a wart and would like to be able to
remove them but unfortunately that can be quite difficult in some
cases. In the future I think it'd be a good idea on figuring out how
to pull these out as much as possible.

> Just some questions. I am really interested in them. I am asking that
> in the view of the rcouch merge. Also I think it's interesting to have
> a clear understanding of the new architecture and how things need to
> be articulated all together. It would also help anyone that want to
> ship internally custom versions of the apache couchdb. Maybe people
> from cloudant and some others already thought about that?
>
>
>
> - benoit

Bob's reply I think gives the general overview. For the merge we're
quite focused on making sure that the single node interface to CouchDB
has zero regressions. And internally we use the single-node interface
quite often even in the case of a cluster for working at the node
level which can be required for some operations.

On the other hand I think the question Benoit is really asking is if
we can ship a working single node without any of the clustering code
at all. I honestly don't know. Most of the cases I think would work
fine but it may be possible that we have something in there that'd
prevent it. In a perfect world that'd be the case but I would wager
we'd need to exert some effort to make it a reality. Though I don't
think its a case we had really considered happening though perhaps its
something we should be considering.

I would say I don't really think that requiring fabric/mem3 etc to
ship for single node CouchDB should be a blocker to merging seeing as
we've never had that sort of separation before. Though I would agree
its work we should be looking into doing to enable it after the merge.