You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by ara howard <ar...@gmail.com> on 2008/11/13 18:01:02 UTC
dirty reads - update strategies
what are people's strategies for dealing with the following scenario
doc_a = get 'id_a'
doc_b = get 'id_b'
obj_c = { 'sum' : doc_a.x + doc_b.y }
put obj_c
this kind of thing is tricky even in a traditional RDBMS, since the
default transaction level may or may not allow the application to see
an uncommitted write by another transaction.
the only way i can think of to get consistency from an op like the
above would be to do
bulk_put [ obj_c, doc_a, doc_b ]
in other words, if you are ever going to compute values to from couch
docs to produce another doc, it would seem that's it's required to put
*all* read information back in order to ensure that the sources have
not changed since the time that you read them. the issue with this,
of course, is that a result computed from many documents is going to
cause exponential slowdown since the potential for overlapping writes
will increase with the number of documents and also the size of
updates themselves will increase similarly.
a solution i can image is something like
list = get 'some_view'
obj = computed_value_from list
obj[ '_depends_on' ] = list.map{|element| [element.id, element.rev]}
put obj
so basically a method to do a put with not only your rev, but that of
'n' dependent docs where only the [id, rev] pair for the dependent
docs need be posted. am i making any sense here?
cheers.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama
Re: dirty reads - update strategies
Posted by Paul Carey <pa...@gmail.com>.
Not addressing your concern about large writes, but avoiding
inconsistent state across docs, you could simply perform your write
> bulk_put [ obj_c, doc_a, doc_b ]
to a separate database e.g. db_cached
So you could query db_cached, knowing that obj_c is consistent with
doc_a and doc_b, but still being able to retrieve the up-to-date data
from db_live if you wish.
Paul
On Thu, Nov 13, 2008 at 5:01 PM, ara howard <ar...@gmail.com> wrote:
>
> what are people's strategies for dealing with the following scenario
>
> doc_a = get 'id_a'
>
> doc_b = get 'id_b'
>
> obj_c = { 'sum' : doc_a.x + doc_b.y }
>
> put obj_c
>
>
> this kind of thing is tricky even in a traditional RDBMS, since the default
> transaction level may or may not allow the application to see an uncommitted
> write by another transaction.
>
> the only way i can think of to get consistency from an op like the above
> would be to do
>
> bulk_put [ obj_c, doc_a, doc_b ]
>
> in other words, if you are ever going to compute values to from couch docs
> to produce another doc, it would seem that's it's required to put *all* read
> information back in order to ensure that the sources have not changed since
> the time that you read them. the issue with this, of course, is that a
> result computed from many documents is going to cause exponential slowdown
> since the potential for overlapping writes will increase with the number of
> documents and also the size of updates themselves will increase similarly.
>
> a solution i can image is something like
>
> list = get 'some_view'
>
> obj = computed_value_from list
>
> obj[ '_depends_on' ] = list.map{|element| [element.id, element.rev]}
>
> put obj
>
>
> so basically a method to do a put with not only your rev, but that of 'n'
> dependent docs where only the [id, rev] pair for the dependent docs need be
> posted. am i making any sense here?
>
> cheers.
>
>
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being better.
> simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
>
Re: dirty reads - update strategies
Posted by "ara.t.howard" <ar...@gmail.com>.
On Nov 13, 2008, at 10:44 AM, Ayende Rahien wrote:
> This is a hack, but you could use bulk_docs to do this, which would
> fail if
> a or b were updated already.This would cause other items (that uses
> a or b
> but not change them) to fail updating.
indeed. but it's late, the reads may already be inconsistent since
they are done serially, and it's impossibly complex attm since
offending docs cannot be identified (i know a change request is in for
this info in the error now).
cheers.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama
Re: dirty reads - update strategies
Posted by Ayende Rahien <ay...@ayende.com>.
This is a hack, but you could use bulk_docs to do this, which would fail if
a or b were updated already.This would cause other items (that uses a or b
but not change them) to fail updating.
On Thu, Nov 13, 2008 at 7:01 PM, ara howard <ar...@gmail.com> wrote:
>
> what are people's strategies for dealing with the following scenario
>
> doc_a = get 'id_a'
>
> doc_b = get 'id_b'
>
> obj_c = { 'sum' : doc_a.x + doc_b.y }
>
> put obj_c
>
>
> this kind of thing is tricky even in a traditional RDBMS, since the default
> transaction level may or may not allow the application to see an uncommitted
> write by another transaction.
>
> the only way i can think of to get consistency from an op like the above
> would be to do
>
> bulk_put [ obj_c, doc_a, doc_b ]
>
> in other words, if you are ever going to compute values to from couch docs
> to produce another doc, it would seem that's it's required to put *all* read
> information back in order to ensure that the sources have not changed since
> the time that you read them. the issue with this, of course, is that a
> result computed from many documents is going to cause exponential slowdown
> since the potential for overlapping writes will increase with the number of
> documents and also the size of updates themselves will increase similarly.
>
> a solution i can image is something like
>
> list = get 'some_view'
>
> obj = computed_value_from list
>
> obj[ '_depends_on' ] = list.map{|element| [element.id, element.rev]}
>
> put obj
>
>
> so basically a method to do a put with not only your rev, but that of 'n'
> dependent docs where only the [id, rev] pair for the dependent docs need be
> posted. am i making any sense here?
>
> cheers.
>
>
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
>
Re: dirty reads - update strategies
Posted by "ara.t.howard" <ar...@gmail.com>.
On Nov 13, 2008, at 11:28 AM, Damien Katz wrote:
>
> Yes, I mean values as computed values. The main post shouldn't be
> updated with a comment count or anything computed like that. It's
> fine if comments have a reference to their parent, and its fine if
> the comments are tagged as children of the post. This way, when the
> main post is opened, the comment count can be computed from a view,
> or when viewing a comment, the user is also shown the parent, and
> maybe subcomments if its a threaded discussion.
okay that makes good sense - same for RDBMS of course too. basically
you're saying 'stay normalized.' i wasn't clear about your meaning of
'values' - which clearly excludes 'ids.'
cheers.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama
Re: dirty reads - update strategies
Posted by Damien Katz <da...@apache.org>.
On Nov 13, 2008, at 1:10 PM, ara.t.howard wrote:
>
> On Nov 13, 2008, at 10:39 AM, Damien Katz wrote:
>
>> My answer is "Don't do that". Values in documents shouldn't depend
>> on values in other documents, that's a better fit for a relational
>> or OO DB. In your example though, CouchDB's views could be used to
>> compute the sums.
>
> i don't think that's realistic. consider something like the
> following:
>
> let's say we write a publishing system, users can create documents
> with content and tags. at the end of the month the editor is going
> to write a summary of the content from that month, obviously this
> summary should be tagged with the union of the tags from all
> summarized content - for later searching. regardless of whether we
> store the tags inside the document or outside of it we have quite a
> task - we need to get a consistent read of all content for the
> month, with all it's tags, in order to properly construct the
> summary document with it's aggregate tags. this isn't strict
> dependence - it's merely a read/write consistency issue which nearly
> any application is going to face. we can argue that it's not
> important that the summary of tags exactly mirrors the tags of it's
> constituent parts, but that kind of thinking results not in an
> information store, but a collection of valueless data.
CouchDB views are a consistent snapshot of the database, your reports
are generated from the views. The view APIs are the place to look for
better reporting capabilties.
>
>
> anyhow, i think it's important to be able to agree upon best
> practices for this kind of operation. saying that values shouldn't
> depend on values in other documents is quite a statement - it means
> couch should no be used for any information store where the
> information value needs to grow recursively.
What I mean is you should never depend on the accuracy of the computed
values in documents that are based on other documents. Particularly
with replication.
> in my case we're modeling financial information which gets processed
> in increasingly sophisticated ways - where documents are inputs to
> processes which produce other documents. i can't think of an
> application that does not do the same thing: a blog comment depends
> on the blog post, a 'friends list' depends on the users, etc.
>
>
> are you referring to 'values' as different from 'ids' ?
Yes, I mean values as computed values. The main post shouldn't be
updated with a comment count or anything computed like that. It's fine
if comments have a reference to their parent, and its fine if the
comments are tagged as children of the post. This way, when the main
post is opened, the comment count can be computed from a view, or when
viewing a comment, the user is also shown the parent, and maybe
subcomments if its a threaded discussion.
-Damien
>
>
> kind regards.
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
Re: dirty reads - update strategies
Posted by "ara.t.howard" <ar...@gmail.com>.
On Nov 13, 2008, at 10:39 AM, Damien Katz wrote:
> My answer is "Don't do that". Values in documents shouldn't depend
> on values in other documents, that's a better fit for a relational
> or OO DB. In your example though, CouchDB's views could be used to
> compute the sums.
i don't think that's realistic. consider something like the following:
let's say we write a publishing system, users can create documents
with content and tags. at the end of the month the editor is going to
write a summary of the content from that month, obviously this summary
should be tagged with the union of the tags from all summarized
content - for later searching. regardless of whether we store the
tags inside the document or outside of it we have quite a task - we
need to get a consistent read of all content for the month, with all
it's tags, in order to properly construct the summary document with
it's aggregate tags. this isn't strict dependence - it's merely a
read/write consistency issue which nearly any application is going to
face. we can argue that it's not important that the summary of tags
exactly mirrors the tags of it's constituent parts, but that kind of
thinking results not in an information store, but a collection of
valueless data.
anyhow, i think it's important to be able to agree upon best practices
for this kind of operation. saying that values shouldn't depend on
values in other documents is quite a statement - it means couch should
no be used for any information store where the information value needs
to grow recursively. in my case we're modeling financial information
which gets processed in increasingly sophisticated ways - where
documents are inputs to processes which produce other documents. i
can't think of an application that does not do the same thing: a blog
comment depends on the blog post, a 'friends list' depends on the
users, etc.
are you referring to 'values' as different from 'ids' ?
kind regards.
a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being
better. simply reflect on that.
h.h. the 14th dalai lama
Re: dirty reads - update strategies
Posted by Nuno Job <nu...@gmail.com>.
jan is at codebits now :D
sorry for the off topic :P
On Thu, Nov 13, 2008 at 12:39 PM, Damien Katz <da...@apache.org> wrote:
> My answer is "Don't do that". Values in documents shouldn't depend on
> values in other documents, that's a better fit for a relational or OO DB. In
> your example though, CouchDB's views could be used to compute the sums.
>
> -Damien
>
>
> On Nov 13, 2008, at 12:01 PM, ara howard wrote:
>
>
>> what are people's strategies for dealing with the following scenario
>>
>> doc_a = get 'id_a'
>>
>> doc_b = get 'id_b'
>>
>> obj_c = { 'sum' : doc_a.x + doc_b.y }
>>
>> put obj_c
>>
>>
>> this kind of thing is tricky even in a traditional RDBMS, since the
>> default transaction level may or may not allow the application to see an
>> uncommitted write by another transaction.
>>
>> the only way i can think of to get consistency from an op like the above
>> would be to do
>>
>> bulk_put [ obj_c, doc_a, doc_b ]
>>
>> in other words, if you are ever going to compute values to from couch docs
>> to produce another doc, it would seem that's it's required to put *all* read
>> information back in order to ensure that the sources have not changed since
>> the time that you read them. the issue with this, of course, is that a
>> result computed from many documents is going to cause exponential slowdown
>> since the potential for overlapping writes will increase with the number of
>> documents and also the size of updates themselves will increase similarly.
>>
>> a solution i can image is something like
>>
>> list = get 'some_view'
>>
>> obj = computed_value_from list
>>
>> obj[ '_depends_on' ] = list.map{|element| [element.id, element.rev]}
>>
>> put obj
>>
>>
>> so basically a method to do a put with not only your rev, but that of 'n'
>> dependent docs where only the [id, rev] pair for the dependent docs need be
>> posted. am i making any sense here?
>>
>> cheers.
>>
>>
>>
>> a @ http://codeforpeople.com/
>> --
>> we can deny everything, except that we have the possibility of being
>> better. simply reflect on that.
>> h.h. the 14th dalai lama
>>
>>
>>
>>
>
Re: dirty reads - update strategies
Posted by Damien Katz <da...@apache.org>.
My answer is "Don't do that". Values in documents shouldn't depend on
values in other documents, that's a better fit for a relational or OO
DB. In your example though, CouchDB's views could be used to compute
the sums.
-Damien
On Nov 13, 2008, at 12:01 PM, ara howard wrote:
>
> what are people's strategies for dealing with the following scenario
>
> doc_a = get 'id_a'
>
> doc_b = get 'id_b'
>
> obj_c = { 'sum' : doc_a.x + doc_b.y }
>
> put obj_c
>
>
> this kind of thing is tricky even in a traditional RDBMS, since the
> default transaction level may or may not allow the application to
> see an uncommitted write by another transaction.
>
> the only way i can think of to get consistency from an op like the
> above would be to do
>
> bulk_put [ obj_c, doc_a, doc_b ]
>
> in other words, if you are ever going to compute values to from
> couch docs to produce another doc, it would seem that's it's
> required to put *all* read information back in order to ensure that
> the sources have not changed since the time that you read them. the
> issue with this, of course, is that a result computed from many
> documents is going to cause exponential slowdown since the potential
> for overlapping writes will increase with the number of documents
> and also the size of updates themselves will increase similarly.
>
> a solution i can image is something like
>
> list = get 'some_view'
>
> obj = computed_value_from list
>
> obj[ '_depends_on' ] = list.map{|element| [element.id, element.rev]}
>
> put obj
>
>
> so basically a method to do a put with not only your rev, but that
> of 'n' dependent docs where only the [id, rev] pair for the
> dependent docs need be posted. am i making any sense here?
>
> cheers.
>
>
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.
> h.h. the 14th dalai lama
>
>
>