You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Stephen Bartell <sn...@gmail.com> on 2013/01/30 08:53:27 UTC

replication by view

Benoit,  I'm curious about a comment you made earlier (from Nathans half-baked idea) and have a few questions to spawn from it:

```
There is another solution though, replication using a view change and real
replication using a view like in rcouch [1]. With the validate_doc_read
function [2]  you can do that. No need for a background process.  Hopefully
this will be merged in couchdb. This have been tested on a relatively large
scale
```

1) If I understand the couch docs correctly, using views as replication filters yields no performance benefit.  Its just like running plain jane filters.  I figure thats why you are mentioning rcouch where the replication source is indexed data.  I just want to clarify that I understand this.

2) For each replicator filter, or view that is run, is a single couchjs process spawned to handle evaling the javascript?

3) What if we write our views in erlang and then use those as replication filters?  Could we get around the extra overhead of couchjs processes this way?

4) Can filters be erlang, thus getting around extra couchjs process overhead?

Thanks for any insight on these newb questions.

sb


Re: replication by view

Posted by Benoit Chesneau <bc...@gmail.com>.
On Wed, Jan 30, 2013 at 8:24 PM, Jens Alfke <je...@couchbase.com> wrote:

>
>
> Is there any chance this feature will be folded back into the main CouchDB
> source tree?
>
>
> There are good chance. Once the branch for the rcouch merge will be setup
I will propose a patch as well for it.

- benoît

Re: replication by view

Posted by Jens Alfke <je...@couchbase.com>.
On Jan 30, 2013, at 10:06 AM, Stephen Bartell <sn...@gmail.com> wrote:

> This seems like a huge improvement! Replicating with filters always bothered me because a first time replication would need to traverse the entire database.  Not to mention the timeout issues which could occur if the db is massive.  I just figured it was a necessary evil.

I agree. Some users of TouchDB have been having performance problems doing filtered pulls from upstream CouchDB databases, for this reason. Filtered replication is important for mobile clients because they may not have the bandwidth or storage space to replicate an entire server-side database.

Is there any chance this feature will be folded back into the main CouchDB source tree?

—Jens

Re: replication by view

Posted by Stephen Bartell <sn...@gmail.com>.
On Jan 30, 2013, at 12:13 AM, Benoit Chesneau <bc...@gmail.com> wrote:

> On Wed, Jan 30, 2013 at 8:53 AM, Stephen Bartell <sn...@gmail.com>wrote:
> 
>> Benoit,  I'm curious about a comment you made earlier (from Nathans
>> half-baked idea) and have a few questions to spawn from it:
>> 
>> ```
>> There is another solution though, replication using a view change and real
>> replication using a view like in rcouch [1]. With the validate_doc_read
>> function [2]  you can do that. No need for a background process.  Hopefully
>> this will be merged in couchdb. This have been tested on a relatively large
>> scale
>> ```
>> 
>> 1) If I understand the couch docs correctly, using views as replication
>> filters yields no performance benefit.  Its just like running plain jane
>> filters.  I figure thats why you are mentioning rcouch where the
>> replication source is indexed data.  I just want to clarify that I
>> understand this.
>> 
> 
> In rcouch the implementation is different this is a real view changes: each
> time a change is indexed in a view an event is sent. All changes are also
> indexed in a btree so you can use a since= parameter. It is working like
> changes on the document database but on an index.
> 
> Since it's a real view you can also query changes in a range
> (start_key/end_key ) or for a key.
> 

This seems like a huge improvement! Replicating with filters always bothered me because a first time replication would need to traverse the entire database.  Not to mention the timeout issues which could occur if the db is massive.  I just figured it was a necessary evil.

> 
>> 2) For each replicator filter, or view that is run, is a single couchjs
>> process spawned to handle evaling the javascript?
>> 
> 
> No. It is just getting changes from a the index. There is no couchjs
> process used except if you also want to filter changes in that view.

Right, but in the case of a javascript filter (not the uber-sweet view changes), is a single couchjs process spawned per filter?

> 
> 
>> 
>> 3) What if we write our views in erlang and then use those as replication
>> filters?  Could we get around the extra overhead of couchjs processes this
>> way?
>> 
> 
> 
> Again see above; View changes is not about filtering. Though you can filter
> these changes with an erlang function.
> 
>> 
>> 4) Can filters be erlang, thus getting around extra couchjs process
>> overhead?
>> 
> 
> Yes but it won't be sandboxed.
> 
> 
> - benoît


Re: replication by view

Posted by Benoit Chesneau <bc...@gmail.com>.
On Wed, Jan 30, 2013 at 6:40 PM, Jens Alfke <je...@couchbase.com> wrote:

>
> On Jan 30, 2013, at 12:13 AM, Benoit Chesneau <bc...@gmail.com> wrote:
>
> > In rcouch the implementation is different this is a real view changes:
> each
> > time a change is indexed in a view an event is sent. All changes are also
> > indexed in a btree so you can use a since= parameter. It is working like
> > changes on the document database but on an index.
>
> I thought rcouch was just a distribution of CouchDB, not something with
> significant implementation differences? The readme on Github doesn’t imply
> that it has any new features over CouchDB. Where could I learn more about
> this feature?
>

right the website [1] tell more than the website. I will fix that. You can
find some doc about this feature on the wiki [2]. There are some other
features in rcouch like the validate on read or some internal changes to
allows the hot upgrades.


> (I’m curious because this sounds similar to what I’m working on for
> BaseCouch, aka the Couchbase sync gateway — in our case it’s more
> specialized, a kind of efficient filtered replication by tags attached to
> documents.)
>

If your interested in  the implementation you can have a look in this
commit [3] .

>
> —Jens


Hope it helps,

- benoît

[1] http://rcouch.org
[2] https://github.com/refuge/rcouch/wiki/View-Changes
[3]
https://github.com/refuge/couch_core/commit/14ad584c993e7a25c000d8db905fd0d0a88a24b4

Re: replication by view

Posted by Jens Alfke <je...@couchbase.com>.
On Jan 30, 2013, at 12:13 AM, Benoit Chesneau <bc...@gmail.com> wrote:

> In rcouch the implementation is different this is a real view changes: each
> time a change is indexed in a view an event is sent. All changes are also
> indexed in a btree so you can use a since= parameter. It is working like
> changes on the document database but on an index.

I thought rcouch was just a distribution of CouchDB, not something with significant implementation differences? The readme on Github doesn’t imply that it has any new features over CouchDB. Where could I learn more about this feature?

(I’m curious because this sounds similar to what I’m working on for BaseCouch, aka the Couchbase sync gateway — in our case it’s more specialized, a kind of efficient filtered replication by tags attached to documents.)

—Jens

Re: replication by view

Posted by Benoit Chesneau <bc...@gmail.com>.
On Wed, Jan 30, 2013 at 8:53 AM, Stephen Bartell <sn...@gmail.com>wrote:

> Benoit,  I'm curious about a comment you made earlier (from Nathans
> half-baked idea) and have a few questions to spawn from it:
>
> ```
> There is another solution though, replication using a view change and real
> replication using a view like in rcouch [1]. With the validate_doc_read
> function [2]  you can do that. No need for a background process.  Hopefully
> this will be merged in couchdb. This have been tested on a relatively large
> scale
> ```
>
> 1) If I understand the couch docs correctly, using views as replication
> filters yields no performance benefit.  Its just like running plain jane
> filters.  I figure thats why you are mentioning rcouch where the
> replication source is indexed data.  I just want to clarify that I
> understand this.
>

In rcouch the implementation is different this is a real view changes: each
time a change is indexed in a view an event is sent. All changes are also
indexed in a btree so you can use a since= parameter. It is working like
changes on the document database but on an index.

Since it's a real view you can also query changes in a range
(start_key/end_key ) or for a key.


> 2) For each replicator filter, or view that is run, is a single couchjs
> process spawned to handle evaling the javascript?
>

No. It is just getting changes from a the index. There is no couchjs
process used except if you also want to filter changes in that view.


>
> 3) What if we write our views in erlang and then use those as replication
> filters?  Could we get around the extra overhead of couchjs processes this
> way?
>


Again see above; View changes is not about filtering. Though you can filter
these changes with an erlang function.

>
> 4) Can filters be erlang, thus getting around extra couchjs process
> overhead?
>

Yes but it won't be sandboxed.


- benoît