You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by James Marca <jm...@translab.its.uci.edu> on 2010/03/01 08:01:51 UTC

Re: Replication question

On Mon, Mar 01, 2010 at 10:29:03AM +1300, Blair Nilsson wrote:
> It shouldn't be surprising though, the target database may already
> have records in it that would change the results, which would be
> difficult to detect without running the map on all the data that was
> already there. Also it is quite likely that it would take longer to
> replicate all the view data then regenerate it. Hell, you may never
> use that view on the replicated end so transferring the processed data
> is a waste anyway.
> 

Okay, but I still think it is a bug.  Aside from specific document
conflicts, the rules for views are that identical input equals
identical output.  So the documents that replicate successfully from
one db to the other should produce identical output
from identical view code.  I don't know much about b-trees, but I
suspect there are algorithms to merge two b-trees efficiently.
If that is true, then if the view is already computed then isn't
the laziest response just to copy it over and merge it with the
current view, even if you have to somehow caveat the replication
conflicts.  

CouchDB seems intelligent enough in the view generation to notice when
docs have changed and only compute views on those docs, so why can't
similar code get thrown at this?

As to whether or not copying the views is useful or not, I think it is
application-specific.  I've got a couple terabytes of data waiting in
the pipe to get processed this way, so actually, in my use case,
re-running the view is out of the question, and re-using views is the
height of efficiency.  And finally, I've only got two views (two
design documents) and I'm certainly going to be using them!

Regards, 

James Marca

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Replication question

Posted by James Marca <jm...@translab.its.uci.edu>.
On Mon, Mar 01, 2010 at 08:33:53AM -0800, J Chris Anderson wrote:
> 
> On Feb 28, 2010, at 11:01 PM, James Marca wrote:
> 
> > On Mon, Mar 01, 2010 at 10:29:03AM +1300, Blair Nilsson wrote:
> >> It shouldn't be surprising though, the target database may already
> >> have records in it that would change the results, which would be
> >> difficult to detect without running the map on all the data that was
> >> already there. Also it is quite likely that it would take longer to
> >> replicate all the view data then regenerate it. Hell, you may never
> >> use that view on the replicated end so transferring the processed data
> >> is a waste anyway.
> >> 
> > 
> > Okay, but I still think it is a bug.  Aside from specific document
> > conflicts, the rules for views are that identical input equals
> > identical output.  So the documents that replicate successfully from
> > one db to the other should produce identical output
> > from identical view code.  I don't know much about b-trees, but I
> > suspect there are algorithms to merge two b-trees efficiently.
> > If that is true, then if the view is already computed then isn't
> > the laziest response just to copy it over and merge it with the
> > current view, even if you have to somehow caveat the replication
> > conflicts.  
> 
> it wouldn't be wrong to do this, but we certainly don't do it yet... complexity. time. we'll get there.


Yes, I apologize "bug" is the wrong word, "feature request" is what I
meant to say. I wish I could tackle this myself, but my time is no
longer my own these days.

> 
> 
> > 
> > CouchDB seems intelligent enough in the view generation to notice when
> > docs have changed and only compute views on those docs, so why can't
> > similar code get thrown at this?
> > 
> > As to whether or not copying the views is useful or not, I think it is
> > application-specific.  I've got a couple terabytes of data waiting in
> > the pipe to get processed this way, so actually, in my use case,
> > re-running the view is out of the question, and re-using views is the
> > height of efficiency.  And finally, I've only got two views (two
> > design documents) and I'm certainly going to be using them!
> > 
> 
> One thing you can do, is merge the view queries without merging the databases. As long as you have identical view definitions and you can bridge the nodes with something like CouchDB Lounge smartproxy, you should be good.

I just might try that.  Lounge looks like it's getting lots of
developer attention.  All I really want in the short term is to hide
merging the view queries from the client.  In the longer term though
I'd love to physically stick a couchdb server on data collection boxes
in the field, so that collecting data becomes a simple pull
replication.

Regards, 

James Marca

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Replication question

Posted by J Chris Anderson <jc...@gmail.com>.
On Feb 28, 2010, at 11:01 PM, James Marca wrote:

> On Mon, Mar 01, 2010 at 10:29:03AM +1300, Blair Nilsson wrote:
>> It shouldn't be surprising though, the target database may already
>> have records in it that would change the results, which would be
>> difficult to detect without running the map on all the data that was
>> already there. Also it is quite likely that it would take longer to
>> replicate all the view data then regenerate it. Hell, you may never
>> use that view on the replicated end so transferring the processed data
>> is a waste anyway.
>> 
> 
> Okay, but I still think it is a bug.  Aside from specific document
> conflicts, the rules for views are that identical input equals
> identical output.  So the documents that replicate successfully from
> one db to the other should produce identical output
> from identical view code.  I don't know much about b-trees, but I
> suspect there are algorithms to merge two b-trees efficiently.
> If that is true, then if the view is already computed then isn't
> the laziest response just to copy it over and merge it with the
> current view, even if you have to somehow caveat the replication
> conflicts.  

it wouldn't be wrong to do this, but we certainly don't do it yet... complexity. time. we'll get there.


> 
> CouchDB seems intelligent enough in the view generation to notice when
> docs have changed and only compute views on those docs, so why can't
> similar code get thrown at this?
> 
> As to whether or not copying the views is useful or not, I think it is
> application-specific.  I've got a couple terabytes of data waiting in
> the pipe to get processed this way, so actually, in my use case,
> re-running the view is out of the question, and re-using views is the
> height of efficiency.  And finally, I've only got two views (two
> design documents) and I'm certainly going to be using them!
> 

One thing you can do, is merge the view queries without merging the databases. As long as you have identical view definitions and you can bridge the nodes with something like CouchDB Lounge smartproxy, you should be good.

> Regards, 
> 
> James Marca
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>