You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Julien Guimont <ju...@msc-mobile.com> on 2008/10/26 15:19:11 UTC

Getting only updates from a view

Hello,

I am looking at CouchDB for a new project. So far it matches a lot of  
requirements that would require ugly hacks using a traditional DB.

One requirement I have is to always have the delta results of a view  
(what documents matching the view has changed since last update).

I read http://wiki.apache.org/couchdb/RegeneratingViewsOnUpdate

and I understood that the views are being reprocessed only with the  
documents that have changed. (Am I wrong?)

Well, if it is, can how can I get the view results only for those  
changed documents?

Thank you!
Julien.

Re: Getting only updates from a view

Posted by Dean Landolt <de...@deanlandolt.com>.
On Sun, Oct 26, 2008 at 11:54 PM, kowsik <ko...@gmail.com> wrote:

> The view server (couchjs) today is primarily a local process
> interacting with couch through stdin/stdout. This is the one that does
> all the map/reduce. I can envision at some point (not sure if and when
> it's planned), the view servers executing on remote machines through
> the exact same line-based JSON protocol (though using TCP) to heavily
> parallelize the map/reduce process. The incremental set of documents
> to be indexed can be easily chunked and distributed across this
> cluster. Until that's possible, the single machine solution is going
> to require some serious compute power (and lots of memory).
>
> IMHO, this [potential] parallelization (in addition to all the
> benefits of document-centric storage) is what makes couch incredibly
> attractive from a scaling perspective.


>From what I've gleaned from blog posts and other reading, distributed
map/reduce is in the roadmap, but implemented in a more couchy manner (over
http). Partial replication is one approach I've heard bantied about -- and
once that lands, depending on how the API looks, it should be easy to shard
documents based on some partition function (or some such) and assemble the
results. There may even be a more explicit distributed solution in the
works, but I doubt it would use pipes for anything other than local
communication.

Re: Getting only updates from a view

Posted by Nick Johnson <ar...@notdot.net>.
On Mon, Oct 27, 2008 at 3:54 AM, kowsik <ko...@gmail.com> wrote:

> The view server (couchjs) today is primarily a local process
> interacting with couch through stdin/stdout. This is the one that does
> all the map/reduce. I can envision at some point (not sure if and when
> it's planned), the view servers executing on remote machines through
> the exact same line-based JSON protocol (though using TCP) to heavily
> parallelize the map/reduce process. The incremental set of documents
> to be indexed can be easily chunked and distributed across this
> cluster. Until that's possible, the single machine solution is going
> to require some serious compute power (and lots of memory).


Hm. Actually, you could do this right now, by writing a view 'server' that
connects to an actual server over TCP (and a corresponding stub for the
other end). It would make a neat project, actually. The only real problem
would be convincing couch to spawn an appropriate number of threads for the
'real' parallelism available.


>
> IMHO, this [potential] parallelization (in addition to all the
> benefits of document-centric storage) is what makes couch incredibly
> attractive from a scaling perspective.
>
> K.
>
> On Sun, Oct 26, 2008 at 7:08 PM, Julien Guimont
> <ju...@msc-mobile.com> wrote:
> > Thanks for the answer, pretty obvious.
> >
> > I would have 1000 to 10000 views to update periodically upon document
> > updates. There would be 5-10 updates a second and more than 500k
> documents.
> > Will couchdb scale in that case?
> >
> > Thank you,
> > Julien.
> >
> > On 26-Oct-08, at 10:36 AM, Ayende Rahien <ay...@ayende.com> wrote:
> >
> >> You create a view indexed by update date (or some other always
> >> incrementing
> >> value).Then you can ask to get the values by that value.
> >>
> >> On Sun, Oct 26, 2008 at 4:19 PM, Julien Guimont <
> >> julien.guimont@msc-mobile.com> wrote:
> >>
> >>> Hello,
> >>>
> >>> I am looking at CouchDB for a new project. So far it matches a lot of
> >>> requirements that would require ugly hacks using a traditional DB.
> >>>
> >>> One requirement I have is to always have the delta results of a view
> >>> (what
> >>> documents matching the view has changed since last update).
> >>>
> >>> I read http://wiki.apache.org/couchdb/RegeneratingViewsOnUpdate
> >>>
> >>> and I understood that the views are being reprocessed only with the
> >>> documents that have changed. (Am I wrong?)
> >>>
> >>> Well, if it is, can how can I get the view results only for those
> changed
> >>> documents?
> >>>
> >>> Thank you!
> >>> Julien.
> >>>
> >
>

Re: Getting only updates from a view

Posted by kowsik <ko...@gmail.com>.
The view server (couchjs) today is primarily a local process
interacting with couch through stdin/stdout. This is the one that does
all the map/reduce. I can envision at some point (not sure if and when
it's planned), the view servers executing on remote machines through
the exact same line-based JSON protocol (though using TCP) to heavily
parallelize the map/reduce process. The incremental set of documents
to be indexed can be easily chunked and distributed across this
cluster. Until that's possible, the single machine solution is going
to require some serious compute power (and lots of memory).

IMHO, this [potential] parallelization (in addition to all the
benefits of document-centric storage) is what makes couch incredibly
attractive from a scaling perspective.

K.

On Sun, Oct 26, 2008 at 7:08 PM, Julien Guimont
<ju...@msc-mobile.com> wrote:
> Thanks for the answer, pretty obvious.
>
> I would have 1000 to 10000 views to update periodically upon document
> updates. There would be 5-10 updates a second and more than 500k documents.
> Will couchdb scale in that case?
>
> Thank you,
> Julien.
>
> On 26-Oct-08, at 10:36 AM, Ayende Rahien <ay...@ayende.com> wrote:
>
>> You create a view indexed by update date (or some other always
>> incrementing
>> value).Then you can ask to get the values by that value.
>>
>> On Sun, Oct 26, 2008 at 4:19 PM, Julien Guimont <
>> julien.guimont@msc-mobile.com> wrote:
>>
>>> Hello,
>>>
>>> I am looking at CouchDB for a new project. So far it matches a lot of
>>> requirements that would require ugly hacks using a traditional DB.
>>>
>>> One requirement I have is to always have the delta results of a view
>>> (what
>>> documents matching the view has changed since last update).
>>>
>>> I read http://wiki.apache.org/couchdb/RegeneratingViewsOnUpdate
>>>
>>> and I understood that the views are being reprocessed only with the
>>> documents that have changed. (Am I wrong?)
>>>
>>> Well, if it is, can how can I get the view results only for those changed
>>> documents?
>>>
>>> Thank you!
>>> Julien.
>>>
>

Re: Getting only updates from a view

Posted by Chris Anderson <jc...@apache.org>.
Plus 1 for the idea of sandboxing each client application in its own database.

If you really are going to allow end users to write views, you should
at least read and understand the security implication of this:
http://peter.michaux.ca/article/8069

It sounds like your application could be a strong fit for couchdb, but
it might take some thought to figure out how best to use it.

Cheers,
Chris

On Mon, Oct 27, 2008 at 7:54 PM, Dean Landolt <de...@deanlandolt.com> wrote:
> Wow. That's a wholelotta client apps. In this case yeah, there may not be a
> great way to generalize the views since it sounds like you don't even
> control them, but from the little I've been able to glean, you may want to
> take some measures to sandbox this view creation somewhat. If all views go
> into the same design document, one view's change results in all views being
> rebuilt -- probably not what you want. So perhaps each client should get
> their own design doc where they can create their views...
>
> But I'd go a little further and suggest that if you have that many clients
> who need ultimate control over views, perhaps you may want to rethink the
> architecture. Letting clients design your views could get a little hairy --
> one rogue view could take down your whole system. I don't know anything
> about the data you're keeping, but perhaps you could expose it certain ways
> useful to as many clients as possible, but punt on all that processing. Let
> the clients deal with that -- and for those that really need their own
> views, just replicate them an instance of the db and they can add all the
> views they want.
>
>
> On Mon, Oct 27, 2008 at 1:58 PM, Julien Guimont <
> julien.guimont@msc-mobile.com> wrote:
>
>> Hello,
>>
>> That is an option i do not completely understand right now. If there are
>> blog post on the subject that would be great!
>>
>> My usage scenario is the following:
>> I have many client applications (1000-10000)
>> They send to my application server what they want as a data set (a view)
>> The application server adds the view to the design document, execute it and
>> return the result to the client
>> Upon document updates, the application server refresh the views and inform
>> the client of new data (is any)
>>
>> Am I going in a good direction?
>>
>> Thank you,
>> Julien.
>>
>>
>> On 27-Oct-08, at 1:39 PM, Dean Landolt wrote:
>>
>>  On Sun, Oct 26, 2008 at 10:08 PM, Julien Guimont <
>>> julien.guimont@msc-mobile.com> wrote:
>>>
>>>  Thanks for the answer, pretty obvious.
>>>>
>>>> I would have 1000 to 10000 views to update periodically upon document
>>>> updates. There would be 5-10 updates a second and more than 500k
>>>> documents.
>>>> Will couchdb scale in that case?
>>>>
>>>> Thank you,
>>>> Julien.
>>>>
>>>
>>>
>>> What kind of use case would require 10000 views? Couldn't there a way to
>>> generalize some of your view functions with compound keys to give you what
>>> you need in a *much* smaller number of views?
>>>
>>
>>
>



-- 
Chris Anderson
http://jchris.mfdz.com

Re: Getting only updates from a view

Posted by Dean Landolt <de...@deanlandolt.com>.
Wow. That's a wholelotta client apps. In this case yeah, there may not be a
great way to generalize the views since it sounds like you don't even
control them, but from the little I've been able to glean, you may want to
take some measures to sandbox this view creation somewhat. If all views go
into the same design document, one view's change results in all views being
rebuilt -- probably not what you want. So perhaps each client should get
their own design doc where they can create their views...

But I'd go a little further and suggest that if you have that many clients
who need ultimate control over views, perhaps you may want to rethink the
architecture. Letting clients design your views could get a little hairy --
one rogue view could take down your whole system. I don't know anything
about the data you're keeping, but perhaps you could expose it certain ways
useful to as many clients as possible, but punt on all that processing. Let
the clients deal with that -- and for those that really need their own
views, just replicate them an instance of the db and they can add all the
views they want.


On Mon, Oct 27, 2008 at 1:58 PM, Julien Guimont <
julien.guimont@msc-mobile.com> wrote:

> Hello,
>
> That is an option i do not completely understand right now. If there are
> blog post on the subject that would be great!
>
> My usage scenario is the following:
> I have many client applications (1000-10000)
> They send to my application server what they want as a data set (a view)
> The application server adds the view to the design document, execute it and
> return the result to the client
> Upon document updates, the application server refresh the views and inform
> the client of new data (is any)
>
> Am I going in a good direction?
>
> Thank you,
> Julien.
>
>
> On 27-Oct-08, at 1:39 PM, Dean Landolt wrote:
>
>  On Sun, Oct 26, 2008 at 10:08 PM, Julien Guimont <
>> julien.guimont@msc-mobile.com> wrote:
>>
>>  Thanks for the answer, pretty obvious.
>>>
>>> I would have 1000 to 10000 views to update periodically upon document
>>> updates. There would be 5-10 updates a second and more than 500k
>>> documents.
>>> Will couchdb scale in that case?
>>>
>>> Thank you,
>>> Julien.
>>>
>>
>>
>> What kind of use case would require 10000 views? Couldn't there a way to
>> generalize some of your view functions with compound keys to give you what
>> you need in a *much* smaller number of views?
>>
>
>

Re: Getting only updates from a view

Posted by Julien Guimont <ju...@msc-mobile.com>.
Hello,

That is an option i do not completely understand right now. If there  
are blog post on the subject that would be great!

My usage scenario is the following:
I have many client applications (1000-10000)
They send to my application server what they want as a data set (a view)
The application server adds the view to the design document, execute  
it and return the result to the client
Upon document updates, the application server refresh the views and  
inform the client of new data (is any)

Am I going in a good direction?

Thank you,
Julien.

On 27-Oct-08, at 1:39 PM, Dean Landolt wrote:

> On Sun, Oct 26, 2008 at 10:08 PM, Julien Guimont <
> julien.guimont@msc-mobile.com> wrote:
>
>> Thanks for the answer, pretty obvious.
>>
>> I would have 1000 to 10000 views to update periodically upon document
>> updates. There would be 5-10 updates a second and more than 500k  
>> documents.
>> Will couchdb scale in that case?
>>
>> Thank you,
>> Julien.
>
>
> What kind of use case would require 10000 views? Couldn't there a  
> way to
> generalize some of your view functions with compound keys to give  
> you what
> you need in a *much* smaller number of views?


Re: Getting only updates from a view

Posted by Dean Landolt <de...@deanlandolt.com>.
On Sun, Oct 26, 2008 at 10:08 PM, Julien Guimont <
julien.guimont@msc-mobile.com> wrote:

> Thanks for the answer, pretty obvious.
>
> I would have 1000 to 10000 views to update periodically upon document
> updates. There would be 5-10 updates a second and more than 500k documents.
> Will couchdb scale in that case?
>
> Thank you,
> Julien.


What kind of use case would require 10000 views? Couldn't there a way to
generalize some of your view functions with compound keys to give you what
you need in a *much* smaller number of views?

Re: Getting only updates from a view

Posted by Julien Guimont <ju...@msc-mobile.com>.
Thanks for the answer, pretty obvious.

I would have 1000 to 10000 views to update periodically upon document  
updates. There would be 5-10 updates a second and more than 500k  
documents. Will couchdb scale in that case?

Thank you,
Julien.

On 26-Oct-08, at 10:36 AM, Ayende Rahien <ay...@ayende.com> wrote:

> You create a view indexed by update date (or some other always  
> incrementing
> value).Then you can ask to get the values by that value.
>
> On Sun, Oct 26, 2008 at 4:19 PM, Julien Guimont <
> julien.guimont@msc-mobile.com> wrote:
>
>> Hello,
>>
>> I am looking at CouchDB for a new project. So far it matches a lot of
>> requirements that would require ugly hacks using a traditional DB.
>>
>> One requirement I have is to always have the delta results of a  
>> view (what
>> documents matching the view has changed since last update).
>>
>> I read http://wiki.apache.org/couchdb/RegeneratingViewsOnUpdate
>>
>> and I understood that the views are being reprocessed only with the
>> documents that have changed. (Am I wrong?)
>>
>> Well, if it is, can how can I get the view results only for those  
>> changed
>> documents?
>>
>> Thank you!
>> Julien.
>>

Re: Getting only updates from a view

Posted by Ayende Rahien <ay...@ayende.com>.
You create a view indexed by update date (or some other always incrementing
value).Then you can ask to get the values by that value.

On Sun, Oct 26, 2008 at 4:19 PM, Julien Guimont <
julien.guimont@msc-mobile.com> wrote:

> Hello,
>
> I am looking at CouchDB for a new project. So far it matches a lot of
> requirements that would require ugly hacks using a traditional DB.
>
> One requirement I have is to always have the delta results of a view (what
> documents matching the view has changed since last update).
>
> I read http://wiki.apache.org/couchdb/RegeneratingViewsOnUpdate
>
> and I understood that the views are being reprocessed only with the
> documents that have changed. (Am I wrong?)
>
> Well, if it is, can how can I get the view results only for those changed
> documents?
>
> Thank you!
> Julien.
>