You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Eric B <eb...@gmail.com> on 2014/10/01 20:02:03 UTC

How to store the delta between doc revisions?

I'm new to CouchDB and trying to figure out the best way to store a history
of changes for a document.

Originally, I was thinking the thing that makes the most sense is to use
the update function of CouchDB but not entirely sure if I can.  Is there
someway to use the update function and modify/create a second document in
the process?

For example, if I have a document which contains notes for a client.
Everytime I modify the notes document (ie: add new lines or delete lines),
I want to maintain the changes made to it.  If there was a way to use
CouchDB's rev fields for this, my problem would be solved, but since
CouchDB deletes non-current revs upon compaction, that is not an option.

So instead, I want to create a "history_log" document, where I can just
store the delta between documents (as a patch, for example).

In order to do this, I need to have my existing document, my new document,
compare the changes and write them to a history_log document.  But I don't
see if/where I can do that within and update handler.

Is there something that can help me do this easily within CouchDB?  Are
there patch or json compare functions I can have access to from within a
CouchDB handler?

Thanks,

Eric

Fwd: How to store the delta between doc revisions?

Posted by Eric B <eb...@gmail.com>.
As you mentioned, the update_notif_handler and changes feed are things that
are triggered after a document is persisted, so it can cause race
conditions.  Ideally, I'm looking to trigger a handler just before it is
persisted.

I looked into the validate_doc_update function, but even if I want to store
the history log within the document (not opposed to it), I can't seem to
modify the contents in the validate_doc_update function (which is
appropriate).  So I'm still no further ahead in figuring out a central
place to do this.

So then I am reduced to ensure that every updateHandler I call creates a
history log, and posts/put of the document do it as well.  Which means that
I am putting code in several different places to perform the same task,
which is error prone and leads to fragmentation.

Unless I am missing something?

Thanks,

Eric

On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Suddenly no. At least completely. You can create your
> validate_doc_update function which will verify that every new stored
> contains some specific data (like previous document version to which
> validate_doc_update also has access), but all this leads to storing
> history log inside single document. If you want to track it
> separately: changes feed and update_notification_handler are your
> friends, but there could be happened race conditions (especially if
> compaction get triggered) so there will be always a chance for you to
> miss some revision.
> --
> ,,,^..^,,,
>
>
> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
> > Thanks for the valid points.  But either way (whether through patches or
> > storing the full previous revision), is there a mechanism in CouchDB in
> > which I can require all calls to trigger an updateHandler?  In a way, I'm
> > looking more for an update interceptor; something to be run just before a
> > document is actually persisted to the DB, but that is always executed.
> >
> > Thanks,
> >
> > Eric
> >
> >
> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Storing patches is good until you're in sure that no single patch will
> >> get suddenly deleted. Otherwise you could easily find all your history
> >> broken. Oblivious, but it is the thing to remember when picking this
> >> way of history management. Storing full document copies per revision
> >> is more solid solution for such case: you can easily skip or lose one
> >> or several revisions and be fine, but it also consumes much more disk
> >> space. Trade offs are everywhere, pick up the one that suites you.
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> >> > I'm new to CouchDB and trying to figure out the best way to store a
> >> history
> >> > of changes for a document.
> >> >
> >> > Originally, I was thinking the thing that makes the most sense is to
> use
> >> > the update function of CouchDB but not entirely sure if I can.  Is
> there
> >> > someway to use the update function and modify/create a second
> document in
> >> > the process?
> >> >
> >> > For example, if I have a document which contains notes for a client.
> >> > Everytime I modify the notes document (ie: add new lines or delete
> >> lines),
> >> > I want to maintain the changes made to it.  If there was a way to use
> >> > CouchDB's rev fields for this, my problem would be solved, but since
> >> > CouchDB deletes non-current revs upon compaction, that is not an
> option.
> >> >
> >> > So instead, I want to create a "history_log" document, where I can
> just
> >> > store the delta between documents (as a patch, for example).
> >> >
> >> > In order to do this, I need to have my existing document, my new
> >> document,
> >> > compare the changes and write them to a history_log document.  But I
> >> don't
> >> > see if/where I can do that within and update handler.
> >> >
> >> > Is there something that can help me do this easily within CouchDB?
> Are
> >> > there patch or json compare functions I can have access to from
> within a
> >> > CouchDB handler?
> >> >
> >> > Thanks,
> >> >
> >> > Eric
> >>
>

Re: How to store the delta between doc revisions?

Posted by "Florian Westreicher Bakk.techn." <st...@meredrica.org>.
What you want to do is impossible when replication is in the mix due to the cap theorem.

If you really need functionality like this, consider using SQL with triggers and transactions. You don't need to have everything in the same database nor the same technology.

On October 2, 2014 3:56:50 PM CEST, "Sanjuan, Hector" <he...@here.com> wrote:
>All is taken care of in the client side.
>
>I don't store deltas/patch files per se, actually I store full
>"previous" and "current" versions of the doc(s). Client should be able
>to produce a diffs when needed it whatever format required.
>
>You could implement cache mechanisms if in need (memcache-like if you
>want). I'm my case documents are fairly small and I am not particularly
>worried about the delay introduced by an extra GET.
>
>As you see there is nothing specially clever in my approach and its
>quite lousy on many aspects. It does not care much about consistency.
>i.e. if a PUT succeeds, the subsequent transaction POST might fail. And
>with replication enabled two people could edit a document in
>conflicting ways and each of them would get a transaction record, even
>though one of their changes will get discarded in the conflict
>resolution. And well, a whole universe of failures can happen when
>editing multiple docs for 1 transaction. So this is more of a simple
>paper trail which I keep for some months and then delete anyway. If you
>aim for something fully consistent, race-condition proof solution, it
>is going to be difficult (possibly impossible in a multi-master
>scenario). Perhaps with update handlers you can reach a compromise
>solution but Im not sure how that is going to work on multi-node setups
>either.
>
>H
>________________________________________
>From: Eric B <eb...@gmail.com>
>Sent: Thursday, October 2, 2014 15:17
>To: user@couchdb.apache.org
>Subject: Re: How to store the delta between doc revisions?
>
>On Thu, Oct 2, 2014 at 4:16 AM, Sanjuan, Hector
><he...@here.com>
>wrote:
>
>> I manage this outside Couchdb. I have a separate database for
>> "transaction" docs which store things like the date a modification
>occurred
>> and the resources that changed and how (one transaction can account
>for
>> changes for several docs if it happened to be triggered by the same
>> operation).
>>
>
>Can you elaborate how you do this?  I presume it must all be taken care
>of
>on the client side?  I haven't found anyway to accomplish something
>like
>this via update handlers.
>
>The main objective is to be able to figure out who touched a doc, when,
>and
>> what change was likely introduced (we don't expect to revert/restore
>old
>> revisions too often, although we could).
>>
>
>So you only store patches between revs then I presume?  Do you actually
>use
>something to do a true patch file, or just in a key/value pair?  ie:
>field1=new value, field2=new value, etc.
>
>
>> It has an overhead (every write triggers a GET to fetch the last
>revision)
>> and doesn't bother much about race conditions or strict history
>consistency
>> (if you do bother too much about these you lose many advantages of
>the
>> noSQL model), but it is really simple to implement (and there is no
>need to
>> debug code that runs inside couchdb).
>>
>
>Have you considered maintaining a local cache to avoid additional gets
>everytime?  ie: upon the original get, cache the data and then check
>the
>cache whenever a write is executed.
>
>I have considered this system, but without multi-document transactions,
>there is no way to ensure consistency.  (ie: if the document update
>succeeds and the history log fails, it is too difficult to roll back
>the
>doc update).  And if only storing deltas, missing a rev would make it
>impossible to rebuild the history of any document.  Additionally, there
>is
>no way to effectively use update handlers, for the same reason as
>above.
>The history log would have to be written only upon success of the
>update
>handler, at which time it may or not be a successful write.  Plus, it
>is
>more difficult retrieving the older rev of the doc that was just
>updated.
>
>Unless I am making things too complicated?
>
>Thanks,
>
>Eric
>
>
>
>
>>
>> ________________________________________
>> From: Alexander Shorin <kx...@gmail.com>
>> Sent: Wednesday, October 1, 2014 22:23
>> To: user@couchdb.apache.org
>> Subject: Re: How to store the delta between doc revisions?
>>
>> That's right: validate_doc_update cannot modify a document to store.
>> But it could check if previous version is included into history log
>> stored within update document - what is actually your update handled
>> doing. So clients have to use your update handlers or implement the
>> same logic on their side to by pass validation.
>> --
>> ,,,^..^,,,
>>
>>
>> On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar <er...@benzacar.ca>
>wrote:
>> > As you mentioned, the update_notif_handler and changes feed are
>things
>> that
>> > are triggered after a document is persisted, so it can cause race
>> > conditions.  Ideally, I'm looking to trigger a handler just before
>it is
>> > persisted.
>> >
>> > I looked into the validate_doc_update function, but even if I want
>to
>> store
>> > the history log within the document (not opposed to it), I can't
>seem to
>> > modify the contents in the validate_doc_update function (which is
>> > appropriate).  So I'm still no further ahead in figuring out a
>central
>> > place to do this.
>> >
>> > So then I am reduced to ensure that every updateHandler I call
>creates a
>> > history log, and posts/put of the document do it as well.  Which
>means
>> that
>> > I am putting code in several different places to perform the same
>task,
>> > which is error prone and leads to fragmentation.
>> >
>> > Unless I am missing something?
>> >
>> > Thanks,
>> >
>> > Eric
>> >
>> > On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com>
>> wrote:
>> >
>> >> Suddenly no. At least completely. You can create your
>> >> validate_doc_update function which will verify that every new
>stored
>> >> contains some specific data (like previous document version to
>which
>> >> validate_doc_update also has access), but all this leads to
>storing
>> >> history log inside single document. If you want to track it
>> >> separately: changes feed and update_notification_handler are your
>> >> friends, but there could be happened race conditions (especially
>if
>> >> compaction get triggered) so there will be always a chance for you
>to
>> >> miss some revision.
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com>
>wrote:
>> >> > Thanks for the valid points.  But either way (whether through
>patches
>> or
>> >> > storing the full previous revision), is there a mechanism in
>CouchDB
>> in
>> >> > which I can require all calls to trigger an updateHandler?  In a
>way,
>> I'm
>> >> > looking more for an update interceptor; something to be run just
>> before a
>> >> > document is actually persisted to the DB, but that is always
>executed.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Eric
>> >> >
>> >> >
>> >> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin
><kx...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Storing patches is good until you're in sure that no single
>patch
>> will
>> >> >> get suddenly deleted. Otherwise you could easily find all your
>> history
>> >> >> broken. Oblivious, but it is the thing to remember when picking
>this
>> >> >> way of history management. Storing full document copies per
>revision
>> >> >> is more solid solution for such case: you can easily skip or
>lose one
>> >> >> or several revisions and be fine, but it also consumes much
>more disk
>> >> >> space. Trade offs are everywhere, pick up the one that suites
>you.
>> >> >> --
>> >> >> ,,,^..^,,,
>> >> >>
>> >> >>
>> >> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com>
>wrote:
>> >> >> > I'm new to CouchDB and trying to figure out the best way to
>store a
>> >> >> history
>> >> >> > of changes for a document.
>> >> >> >
>> >> >> > Originally, I was thinking the thing that makes the most
>sense is
>> to
>> >> use
>> >> >> > the update function of CouchDB but not entirely sure if I
>can.  Is
>> >> there
>> >> >> > someway to use the update function and modify/create a second
>> >> document in
>> >> >> > the process?
>> >> >> >
>> >> >> > For example, if I have a document which contains notes for a
>> client.
>> >> >> > Everytime I modify the notes document (ie: add new lines or
>delete
>> >> >> lines),
>> >> >> > I want to maintain the changes made to it.  If there was a
>way to
>> use
>> >> >> > CouchDB's rev fields for this, my problem would be solved,
>but
>> since
>> >> >> > CouchDB deletes non-current revs upon compaction, that is not
>an
>> >> option.
>> >> >> >
>> >> >> > So instead, I want to create a "history_log" document, where
>I can
>> >> just
>> >> >> > store the delta between documents (as a patch, for example).
>> >> >> >
>> >> >> > In order to do this, I need to have my existing document, my
>new
>> >> >> document,
>> >> >> > compare the changes and write them to a history_log document.
> But
>> I
>> >> >> don't
>> >> >> > see if/where I can do that within and update handler.
>> >> >> >
>> >> >> > Is there something that can help me do this easily within
>CouchDB?
>> >> Are
>> >> >> > there patch or json compare functions I can have access to
>from
>> >> within a
>> >> >> > CouchDB handler?
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Eric
>> >> >>
>> >>
>>

-- 
Sent from Kaiten Mail. Please excuse my brevity.

Re: How to store the delta between doc revisions?

Posted by "Sanjuan, Hector" <he...@here.com>.
All is taken care of in the client side.

I don't store deltas/patch files per se, actually I store full "previous" and "current" versions of the doc(s). Client should be able to produce a diffs when needed it whatever format required.

You could implement cache mechanisms if in need (memcache-like if you want). I'm my case documents are fairly small and I am not particularly worried about the delay introduced by an extra GET.

As you see there is nothing specially clever in my approach and its quite lousy on many aspects. It does not care much about consistency. i.e. if a PUT succeeds, the subsequent transaction POST might fail. And with replication enabled two people could edit a document in conflicting ways and each of them would get a transaction record, even though one of their changes will get discarded in the conflict resolution. And well, a whole universe of failures can happen when editing multiple docs for 1 transaction. So this is more of a simple paper trail which I keep for some months and then delete anyway. If you aim for something fully consistent, race-condition proof solution, it is going to be difficult (possibly impossible in a multi-master scenario). Perhaps with update handlers you can reach a compromise solution but Im not sure how that is going to work on multi-node setups either.

H
________________________________________
From: Eric B <eb...@gmail.com>
Sent: Thursday, October 2, 2014 15:17
To: user@couchdb.apache.org
Subject: Re: How to store the delta between doc revisions?

On Thu, Oct 2, 2014 at 4:16 AM, Sanjuan, Hector <he...@here.com>
wrote:

> I manage this outside Couchdb. I have a separate database for
> "transaction" docs which store things like the date a modification occurred
> and the resources that changed and how (one transaction can account for
> changes for several docs if it happened to be triggered by the same
> operation).
>

Can you elaborate how you do this?  I presume it must all be taken care of
on the client side?  I haven't found anyway to accomplish something like
this via update handlers.

The main objective is to be able to figure out who touched a doc, when, and
> what change was likely introduced (we don't expect to revert/restore old
> revisions too often, although we could).
>

So you only store patches between revs then I presume?  Do you actually use
something to do a true patch file, or just in a key/value pair?  ie:
field1=new value, field2=new value, etc.


> It has an overhead (every write triggers a GET to fetch the last revision)
> and doesn't bother much about race conditions or strict history consistency
> (if you do bother too much about these you lose many advantages of the
> noSQL model), but it is really simple to implement (and there is no need to
> debug code that runs inside couchdb).
>

Have you considered maintaining a local cache to avoid additional gets
everytime?  ie: upon the original get, cache the data and then check the
cache whenever a write is executed.

I have considered this system, but without multi-document transactions,
there is no way to ensure consistency.  (ie: if the document update
succeeds and the history log fails, it is too difficult to roll back the
doc update).  And if only storing deltas, missing a rev would make it
impossible to rebuild the history of any document.  Additionally, there is
no way to effectively use update handlers, for the same reason as above.
The history log would have to be written only upon success of the update
handler, at which time it may or not be a successful write.  Plus, it is
more difficult retrieving the older rev of the doc that was just updated.

Unless I am making things too complicated?

Thanks,

Eric




>
> ________________________________________
> From: Alexander Shorin <kx...@gmail.com>
> Sent: Wednesday, October 1, 2014 22:23
> To: user@couchdb.apache.org
> Subject: Re: How to store the delta between doc revisions?
>
> That's right: validate_doc_update cannot modify a document to store.
> But it could check if previous version is included into history log
> stored within update document - what is actually your update handled
> doing. So clients have to use your update handlers or implement the
> same logic on their side to by pass validation.
> --
> ,,,^..^,,,
>
>
> On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar <er...@benzacar.ca> wrote:
> > As you mentioned, the update_notif_handler and changes feed are things
> that
> > are triggered after a document is persisted, so it can cause race
> > conditions.  Ideally, I'm looking to trigger a handler just before it is
> > persisted.
> >
> > I looked into the validate_doc_update function, but even if I want to
> store
> > the history log within the document (not opposed to it), I can't seem to
> > modify the contents in the validate_doc_update function (which is
> > appropriate).  So I'm still no further ahead in figuring out a central
> > place to do this.
> >
> > So then I am reduced to ensure that every updateHandler I call creates a
> > history log, and posts/put of the document do it as well.  Which means
> that
> > I am putting code in several different places to perform the same task,
> > which is error prone and leads to fragmentation.
> >
> > Unless I am missing something?
> >
> > Thanks,
> >
> > Eric
> >
> > On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Suddenly no. At least completely. You can create your
> >> validate_doc_update function which will verify that every new stored
> >> contains some specific data (like previous document version to which
> >> validate_doc_update also has access), but all this leads to storing
> >> history log inside single document. If you want to track it
> >> separately: changes feed and update_notification_handler are your
> >> friends, but there could be happened race conditions (especially if
> >> compaction get triggered) so there will be always a chance for you to
> >> miss some revision.
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
> >> > Thanks for the valid points.  But either way (whether through patches
> or
> >> > storing the full previous revision), is there a mechanism in CouchDB
> in
> >> > which I can require all calls to trigger an updateHandler?  In a way,
> I'm
> >> > looking more for an update interceptor; something to be run just
> before a
> >> > document is actually persisted to the DB, but that is always executed.
> >> >
> >> > Thanks,
> >> >
> >> > Eric
> >> >
> >> >
> >> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
> >> wrote:
> >> >
> >> >> Storing patches is good until you're in sure that no single patch
> will
> >> >> get suddenly deleted. Otherwise you could easily find all your
> history
> >> >> broken. Oblivious, but it is the thing to remember when picking this
> >> >> way of history management. Storing full document copies per revision
> >> >> is more solid solution for such case: you can easily skip or lose one
> >> >> or several revisions and be fine, but it also consumes much more disk
> >> >> space. Trade offs are everywhere, pick up the one that suites you.
> >> >> --
> >> >> ,,,^..^,,,
> >> >>
> >> >>
> >> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> >> >> > I'm new to CouchDB and trying to figure out the best way to store a
> >> >> history
> >> >> > of changes for a document.
> >> >> >
> >> >> > Originally, I was thinking the thing that makes the most sense is
> to
> >> use
> >> >> > the update function of CouchDB but not entirely sure if I can.  Is
> >> there
> >> >> > someway to use the update function and modify/create a second
> >> document in
> >> >> > the process?
> >> >> >
> >> >> > For example, if I have a document which contains notes for a
> client.
> >> >> > Everytime I modify the notes document (ie: add new lines or delete
> >> >> lines),
> >> >> > I want to maintain the changes made to it.  If there was a way to
> use
> >> >> > CouchDB's rev fields for this, my problem would be solved, but
> since
> >> >> > CouchDB deletes non-current revs upon compaction, that is not an
> >> option.
> >> >> >
> >> >> > So instead, I want to create a "history_log" document, where I can
> >> just
> >> >> > store the delta between documents (as a patch, for example).
> >> >> >
> >> >> > In order to do this, I need to have my existing document, my new
> >> >> document,
> >> >> > compare the changes and write them to a history_log document.  But
> I
> >> >> don't
> >> >> > see if/where I can do that within and update handler.
> >> >> >
> >> >> > Is there something that can help me do this easily within CouchDB?
> >> Are
> >> >> > there patch or json compare functions I can have access to from
> >> within a
> >> >> > CouchDB handler?
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Eric
> >> >>
> >>
>

Re: How to store the delta between doc revisions?

Posted by Eric B <eb...@gmail.com>.
On Thu, Oct 2, 2014 at 4:16 AM, Sanjuan, Hector <he...@here.com>
wrote:

> I manage this outside Couchdb. I have a separate database for
> "transaction" docs which store things like the date a modification occurred
> and the resources that changed and how (one transaction can account for
> changes for several docs if it happened to be triggered by the same
> operation).
>

Can you elaborate how you do this?  I presume it must all be taken care of
on the client side?  I haven't found anyway to accomplish something like
this via update handlers.

The main objective is to be able to figure out who touched a doc, when, and
> what change was likely introduced (we don't expect to revert/restore old
> revisions too often, although we could).
>

So you only store patches between revs then I presume?  Do you actually use
something to do a true patch file, or just in a key/value pair?  ie:
field1=new value, field2=new value, etc.


> It has an overhead (every write triggers a GET to fetch the last revision)
> and doesn't bother much about race conditions or strict history consistency
> (if you do bother too much about these you lose many advantages of the
> noSQL model), but it is really simple to implement (and there is no need to
> debug code that runs inside couchdb).
>

Have you considered maintaining a local cache to avoid additional gets
everytime?  ie: upon the original get, cache the data and then check the
cache whenever a write is executed.

I have considered this system, but without multi-document transactions,
there is no way to ensure consistency.  (ie: if the document update
succeeds and the history log fails, it is too difficult to roll back the
doc update).  And if only storing deltas, missing a rev would make it
impossible to rebuild the history of any document.  Additionally, there is
no way to effectively use update handlers, for the same reason as above.
The history log would have to be written only upon success of the update
handler, at which time it may or not be a successful write.  Plus, it is
more difficult retrieving the older rev of the doc that was just updated.

Unless I am making things too complicated?

Thanks,

Eric




>
> ________________________________________
> From: Alexander Shorin <kx...@gmail.com>
> Sent: Wednesday, October 1, 2014 22:23
> To: user@couchdb.apache.org
> Subject: Re: How to store the delta between doc revisions?
>
> That's right: validate_doc_update cannot modify a document to store.
> But it could check if previous version is included into history log
> stored within update document - what is actually your update handled
> doing. So clients have to use your update handlers or implement the
> same logic on their side to by pass validation.
> --
> ,,,^..^,,,
>
>
> On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar <er...@benzacar.ca> wrote:
> > As you mentioned, the update_notif_handler and changes feed are things
> that
> > are triggered after a document is persisted, so it can cause race
> > conditions.  Ideally, I'm looking to trigger a handler just before it is
> > persisted.
> >
> > I looked into the validate_doc_update function, but even if I want to
> store
> > the history log within the document (not opposed to it), I can't seem to
> > modify the contents in the validate_doc_update function (which is
> > appropriate).  So I'm still no further ahead in figuring out a central
> > place to do this.
> >
> > So then I am reduced to ensure that every updateHandler I call creates a
> > history log, and posts/put of the document do it as well.  Which means
> that
> > I am putting code in several different places to perform the same task,
> > which is error prone and leads to fragmentation.
> >
> > Unless I am missing something?
> >
> > Thanks,
> >
> > Eric
> >
> > On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Suddenly no. At least completely. You can create your
> >> validate_doc_update function which will verify that every new stored
> >> contains some specific data (like previous document version to which
> >> validate_doc_update also has access), but all this leads to storing
> >> history log inside single document. If you want to track it
> >> separately: changes feed and update_notification_handler are your
> >> friends, but there could be happened race conditions (especially if
> >> compaction get triggered) so there will be always a chance for you to
> >> miss some revision.
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
> >> > Thanks for the valid points.  But either way (whether through patches
> or
> >> > storing the full previous revision), is there a mechanism in CouchDB
> in
> >> > which I can require all calls to trigger an updateHandler?  In a way,
> I'm
> >> > looking more for an update interceptor; something to be run just
> before a
> >> > document is actually persisted to the DB, but that is always executed.
> >> >
> >> > Thanks,
> >> >
> >> > Eric
> >> >
> >> >
> >> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
> >> wrote:
> >> >
> >> >> Storing patches is good until you're in sure that no single patch
> will
> >> >> get suddenly deleted. Otherwise you could easily find all your
> history
> >> >> broken. Oblivious, but it is the thing to remember when picking this
> >> >> way of history management. Storing full document copies per revision
> >> >> is more solid solution for such case: you can easily skip or lose one
> >> >> or several revisions and be fine, but it also consumes much more disk
> >> >> space. Trade offs are everywhere, pick up the one that suites you.
> >> >> --
> >> >> ,,,^..^,,,
> >> >>
> >> >>
> >> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> >> >> > I'm new to CouchDB and trying to figure out the best way to store a
> >> >> history
> >> >> > of changes for a document.
> >> >> >
> >> >> > Originally, I was thinking the thing that makes the most sense is
> to
> >> use
> >> >> > the update function of CouchDB but not entirely sure if I can.  Is
> >> there
> >> >> > someway to use the update function and modify/create a second
> >> document in
> >> >> > the process?
> >> >> >
> >> >> > For example, if I have a document which contains notes for a
> client.
> >> >> > Everytime I modify the notes document (ie: add new lines or delete
> >> >> lines),
> >> >> > I want to maintain the changes made to it.  If there was a way to
> use
> >> >> > CouchDB's rev fields for this, my problem would be solved, but
> since
> >> >> > CouchDB deletes non-current revs upon compaction, that is not an
> >> option.
> >> >> >
> >> >> > So instead, I want to create a "history_log" document, where I can
> >> just
> >> >> > store the delta between documents (as a patch, for example).
> >> >> >
> >> >> > In order to do this, I need to have my existing document, my new
> >> >> document,
> >> >> > compare the changes and write them to a history_log document.  But
> I
> >> >> don't
> >> >> > see if/where I can do that within and update handler.
> >> >> >
> >> >> > Is there something that can help me do this easily within CouchDB?
> >> Are
> >> >> > there patch or json compare functions I can have access to from
> >> within a
> >> >> > CouchDB handler?
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Eric
> >> >>
> >>
>

Re: How to store the delta between doc revisions?

Posted by "Sanjuan, Hector" <he...@here.com>.
Hi,

I manage this outside Couchdb. I have a separate database for "transaction" docs which store things like the date a modification occurred and the resources that changed and how (one transaction can account for changes for several docs if it happened to be triggered by the same operation).

Then, with the help of some views it is easy to collect transactions affecting a particular document and have them ordered by timestamp, but also transactions affecting documents of a particular type.. or transactions older than certain date (in case you want to free space at some point).

The main objective is to be able to figure out who touched a doc, when, and what change was likely introduced (we don't expect to revert/restore old revisions too often, although we could).

It has an overhead (every write triggers a GET to fetch the last revision) and doesn't bother much about race conditions or strict history consistency (if you do bother too much about these you lose many advantages of the noSQL model), but it is really simple to implement (and there is no need to debug code that runs inside couchdb).

Keep in mind that Couchdb compromises on consistency (in exchange for replication, speed etc.) make it not appropriate to maintain a fully consistent history in the way others, say Git, would.

H


________________________________________
From: Alexander Shorin <kx...@gmail.com>
Sent: Wednesday, October 1, 2014 22:23
To: user@couchdb.apache.org
Subject: Re: How to store the delta between doc revisions?

That's right: validate_doc_update cannot modify a document to store.
But it could check if previous version is included into history log
stored within update document - what is actually your update handled
doing. So clients have to use your update handlers or implement the
same logic on their side to by pass validation.
--
,,,^..^,,,


On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar <er...@benzacar.ca> wrote:
> As you mentioned, the update_notif_handler and changes feed are things that
> are triggered after a document is persisted, so it can cause race
> conditions.  Ideally, I'm looking to trigger a handler just before it is
> persisted.
>
> I looked into the validate_doc_update function, but even if I want to store
> the history log within the document (not opposed to it), I can't seem to
> modify the contents in the validate_doc_update function (which is
> appropriate).  So I'm still no further ahead in figuring out a central
> place to do this.
>
> So then I am reduced to ensure that every updateHandler I call creates a
> history log, and posts/put of the document do it as well.  Which means that
> I am putting code in several different places to perform the same task,
> which is error prone and leads to fragmentation.
>
> Unless I am missing something?
>
> Thanks,
>
> Eric
>
> On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> Suddenly no. At least completely. You can create your
>> validate_doc_update function which will verify that every new stored
>> contains some specific data (like previous document version to which
>> validate_doc_update also has access), but all this leads to storing
>> history log inside single document. If you want to track it
>> separately: changes feed and update_notification_handler are your
>> friends, but there could be happened race conditions (especially if
>> compaction get triggered) so there will be always a chance for you to
>> miss some revision.
>> --
>> ,,,^..^,,,
>>
>>
>> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
>> > Thanks for the valid points.  But either way (whether through patches or
>> > storing the full previous revision), is there a mechanism in CouchDB in
>> > which I can require all calls to trigger an updateHandler?  In a way, I'm
>> > looking more for an update interceptor; something to be run just before a
>> > document is actually persisted to the DB, but that is always executed.
>> >
>> > Thanks,
>> >
>> > Eric
>> >
>> >
>> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
>> wrote:
>> >
>> >> Storing patches is good until you're in sure that no single patch will
>> >> get suddenly deleted. Otherwise you could easily find all your history
>> >> broken. Oblivious, but it is the thing to remember when picking this
>> >> way of history management. Storing full document copies per revision
>> >> is more solid solution for such case: you can easily skip or lose one
>> >> or several revisions and be fine, but it also consumes much more disk
>> >> space. Trade offs are everywhere, pick up the one that suites you.
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
>> >> > I'm new to CouchDB and trying to figure out the best way to store a
>> >> history
>> >> > of changes for a document.
>> >> >
>> >> > Originally, I was thinking the thing that makes the most sense is to
>> use
>> >> > the update function of CouchDB but not entirely sure if I can.  Is
>> there
>> >> > someway to use the update function and modify/create a second
>> document in
>> >> > the process?
>> >> >
>> >> > For example, if I have a document which contains notes for a client.
>> >> > Everytime I modify the notes document (ie: add new lines or delete
>> >> lines),
>> >> > I want to maintain the changes made to it.  If there was a way to use
>> >> > CouchDB's rev fields for this, my problem would be solved, but since
>> >> > CouchDB deletes non-current revs upon compaction, that is not an
>> option.
>> >> >
>> >> > So instead, I want to create a "history_log" document, where I can
>> just
>> >> > store the delta between documents (as a patch, for example).
>> >> >
>> >> > In order to do this, I need to have my existing document, my new
>> >> document,
>> >> > compare the changes and write them to a history_log document.  But I
>> >> don't
>> >> > see if/where I can do that within and update handler.
>> >> >
>> >> > Is there something that can help me do this easily within CouchDB?
>> Are
>> >> > there patch or json compare functions I can have access to from
>> within a
>> >> > CouchDB handler?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Eric
>> >>
>>

Re: How to store the delta between doc revisions?

Posted by Alexander Shorin <kx...@gmail.com>.
That's right: validate_doc_update cannot modify a document to store.
But it could check if previous version is included into history log
stored within update document - what is actually your update handled
doing. So clients have to use your update handlers or implement the
same logic on their side to by pass validation.
--
,,,^..^,,,


On Wed, Oct 1, 2014 at 11:45 PM, Eric Benzacar <er...@benzacar.ca> wrote:
> As you mentioned, the update_notif_handler and changes feed are things that
> are triggered after a document is persisted, so it can cause race
> conditions.  Ideally, I'm looking to trigger a handler just before it is
> persisted.
>
> I looked into the validate_doc_update function, but even if I want to store
> the history log within the document (not opposed to it), I can't seem to
> modify the contents in the validate_doc_update function (which is
> appropriate).  So I'm still no further ahead in figuring out a central
> place to do this.
>
> So then I am reduced to ensure that every updateHandler I call creates a
> history log, and posts/put of the document do it as well.  Which means that
> I am putting code in several different places to perform the same task,
> which is error prone and leads to fragmentation.
>
> Unless I am missing something?
>
> Thanks,
>
> Eric
>
> On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> Suddenly no. At least completely. You can create your
>> validate_doc_update function which will verify that every new stored
>> contains some specific data (like previous document version to which
>> validate_doc_update also has access), but all this leads to storing
>> history log inside single document. If you want to track it
>> separately: changes feed and update_notification_handler are your
>> friends, but there could be happened race conditions (especially if
>> compaction get triggered) so there will be always a chance for you to
>> miss some revision.
>> --
>> ,,,^..^,,,
>>
>>
>> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
>> > Thanks for the valid points.  But either way (whether through patches or
>> > storing the full previous revision), is there a mechanism in CouchDB in
>> > which I can require all calls to trigger an updateHandler?  In a way, I'm
>> > looking more for an update interceptor; something to be run just before a
>> > document is actually persisted to the DB, but that is always executed.
>> >
>> > Thanks,
>> >
>> > Eric
>> >
>> >
>> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
>> wrote:
>> >
>> >> Storing patches is good until you're in sure that no single patch will
>> >> get suddenly deleted. Otherwise you could easily find all your history
>> >> broken. Oblivious, but it is the thing to remember when picking this
>> >> way of history management. Storing full document copies per revision
>> >> is more solid solution for such case: you can easily skip or lose one
>> >> or several revisions and be fine, but it also consumes much more disk
>> >> space. Trade offs are everywhere, pick up the one that suites you.
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
>> >> > I'm new to CouchDB and trying to figure out the best way to store a
>> >> history
>> >> > of changes for a document.
>> >> >
>> >> > Originally, I was thinking the thing that makes the most sense is to
>> use
>> >> > the update function of CouchDB but not entirely sure if I can.  Is
>> there
>> >> > someway to use the update function and modify/create a second
>> document in
>> >> > the process?
>> >> >
>> >> > For example, if I have a document which contains notes for a client.
>> >> > Everytime I modify the notes document (ie: add new lines or delete
>> >> lines),
>> >> > I want to maintain the changes made to it.  If there was a way to use
>> >> > CouchDB's rev fields for this, my problem would be solved, but since
>> >> > CouchDB deletes non-current revs upon compaction, that is not an
>> option.
>> >> >
>> >> > So instead, I want to create a "history_log" document, where I can
>> just
>> >> > store the delta between documents (as a patch, for example).
>> >> >
>> >> > In order to do this, I need to have my existing document, my new
>> >> document,
>> >> > compare the changes and write them to a history_log document.  But I
>> >> don't
>> >> > see if/where I can do that within and update handler.
>> >> >
>> >> > Is there something that can help me do this easily within CouchDB?
>> Are
>> >> > there patch or json compare functions I can have access to from
>> within a
>> >> > CouchDB handler?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Eric
>> >>
>>

Re: How to store the delta between doc revisions?

Posted by Eric Benzacar <er...@benzacar.ca>.
As you mentioned, the update_notif_handler and changes feed are things that
are triggered after a document is persisted, so it can cause race
conditions.  Ideally, I'm looking to trigger a handler just before it is
persisted.

I looked into the validate_doc_update function, but even if I want to store
the history log within the document (not opposed to it), I can't seem to
modify the contents in the validate_doc_update function (which is
appropriate).  So I'm still no further ahead in figuring out a central
place to do this.

So then I am reduced to ensure that every updateHandler I call creates a
history log, and posts/put of the document do it as well.  Which means that
I am putting code in several different places to perform the same task,
which is error prone and leads to fragmentation.

Unless I am missing something?

Thanks,

Eric

On Wed, Oct 1, 2014 at 3:30 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Suddenly no. At least completely. You can create your
> validate_doc_update function which will verify that every new stored
> contains some specific data (like previous document version to which
> validate_doc_update also has access), but all this leads to storing
> history log inside single document. If you want to track it
> separately: changes feed and update_notification_handler are your
> friends, but there could be happened race conditions (especially if
> compaction get triggered) so there will be always a chance for you to
> miss some revision.
> --
> ,,,^..^,,,
>
>
> On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
> > Thanks for the valid points.  But either way (whether through patches or
> > storing the full previous revision), is there a mechanism in CouchDB in
> > which I can require all calls to trigger an updateHandler?  In a way, I'm
> > looking more for an update interceptor; something to be run just before a
> > document is actually persisted to the DB, but that is always executed.
> >
> > Thanks,
> >
> > Eric
> >
> >
> > On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com>
> wrote:
> >
> >> Storing patches is good until you're in sure that no single patch will
> >> get suddenly deleted. Otherwise you could easily find all your history
> >> broken. Oblivious, but it is the thing to remember when picking this
> >> way of history management. Storing full document copies per revision
> >> is more solid solution for such case: you can easily skip or lose one
> >> or several revisions and be fine, but it also consumes much more disk
> >> space. Trade offs are everywhere, pick up the one that suites you.
> >> --
> >> ,,,^..^,,,
> >>
> >>
> >> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> >> > I'm new to CouchDB and trying to figure out the best way to store a
> >> history
> >> > of changes for a document.
> >> >
> >> > Originally, I was thinking the thing that makes the most sense is to
> use
> >> > the update function of CouchDB but not entirely sure if I can.  Is
> there
> >> > someway to use the update function and modify/create a second
> document in
> >> > the process?
> >> >
> >> > For example, if I have a document which contains notes for a client.
> >> > Everytime I modify the notes document (ie: add new lines or delete
> >> lines),
> >> > I want to maintain the changes made to it.  If there was a way to use
> >> > CouchDB's rev fields for this, my problem would be solved, but since
> >> > CouchDB deletes non-current revs upon compaction, that is not an
> option.
> >> >
> >> > So instead, I want to create a "history_log" document, where I can
> just
> >> > store the delta between documents (as a patch, for example).
> >> >
> >> > In order to do this, I need to have my existing document, my new
> >> document,
> >> > compare the changes and write them to a history_log document.  But I
> >> don't
> >> > see if/where I can do that within and update handler.
> >> >
> >> > Is there something that can help me do this easily within CouchDB?
> Are
> >> > there patch or json compare functions I can have access to from
> within a
> >> > CouchDB handler?
> >> >
> >> > Thanks,
> >> >
> >> > Eric
> >>
>

Re: How to store the delta between doc revisions?

Posted by Alexander Shorin <kx...@gmail.com>.
Suddenly no. At least completely. You can create your
validate_doc_update function which will verify that every new stored
contains some specific data (like previous document version to which
validate_doc_update also has access), but all this leads to storing
history log inside single document. If you want to track it
separately: changes feed and update_notification_handler are your
friends, but there could be happened race conditions (especially if
compaction get triggered) so there will be always a chance for you to
miss some revision.
--
,,,^..^,,,


On Wed, Oct 1, 2014 at 11:18 PM, Eric B <eb...@gmail.com> wrote:
> Thanks for the valid points.  But either way (whether through patches or
> storing the full previous revision), is there a mechanism in CouchDB in
> which I can require all calls to trigger an updateHandler?  In a way, I'm
> looking more for an update interceptor; something to be run just before a
> document is actually persisted to the DB, but that is always executed.
>
> Thanks,
>
> Eric
>
>
> On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> Storing patches is good until you're in sure that no single patch will
>> get suddenly deleted. Otherwise you could easily find all your history
>> broken. Oblivious, but it is the thing to remember when picking this
>> way of history management. Storing full document copies per revision
>> is more solid solution for such case: you can easily skip or lose one
>> or several revisions and be fine, but it also consumes much more disk
>> space. Trade offs are everywhere, pick up the one that suites you.
>> --
>> ,,,^..^,,,
>>
>>
>> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
>> > I'm new to CouchDB and trying to figure out the best way to store a
>> history
>> > of changes for a document.
>> >
>> > Originally, I was thinking the thing that makes the most sense is to use
>> > the update function of CouchDB but not entirely sure if I can.  Is there
>> > someway to use the update function and modify/create a second document in
>> > the process?
>> >
>> > For example, if I have a document which contains notes for a client.
>> > Everytime I modify the notes document (ie: add new lines or delete
>> lines),
>> > I want to maintain the changes made to it.  If there was a way to use
>> > CouchDB's rev fields for this, my problem would be solved, but since
>> > CouchDB deletes non-current revs upon compaction, that is not an option.
>> >
>> > So instead, I want to create a "history_log" document, where I can just
>> > store the delta between documents (as a patch, for example).
>> >
>> > In order to do this, I need to have my existing document, my new
>> document,
>> > compare the changes and write them to a history_log document.  But I
>> don't
>> > see if/where I can do that within and update handler.
>> >
>> > Is there something that can help me do this easily within CouchDB?  Are
>> > there patch or json compare functions I can have access to from within a
>> > CouchDB handler?
>> >
>> > Thanks,
>> >
>> > Eric
>>

Re: How to store the delta between doc revisions?

Posted by Eric B <eb...@gmail.com>.
Thanks for the valid points.  But either way (whether through patches or
storing the full previous revision), is there a mechanism in CouchDB in
which I can require all calls to trigger an updateHandler?  In a way, I'm
looking more for an update interceptor; something to be run just before a
document is actually persisted to the DB, but that is always executed.

Thanks,

Eric


On Wed, Oct 1, 2014 at 3:03 PM, Alexander Shorin <kx...@gmail.com> wrote:

> Storing patches is good until you're in sure that no single patch will
> get suddenly deleted. Otherwise you could easily find all your history
> broken. Oblivious, but it is the thing to remember when picking this
> way of history management. Storing full document copies per revision
> is more solid solution for such case: you can easily skip or lose one
> or several revisions and be fine, but it also consumes much more disk
> space. Trade offs are everywhere, pick up the one that suites you.
> --
> ,,,^..^,,,
>
>
> On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> > I'm new to CouchDB and trying to figure out the best way to store a
> history
> > of changes for a document.
> >
> > Originally, I was thinking the thing that makes the most sense is to use
> > the update function of CouchDB but not entirely sure if I can.  Is there
> > someway to use the update function and modify/create a second document in
> > the process?
> >
> > For example, if I have a document which contains notes for a client.
> > Everytime I modify the notes document (ie: add new lines or delete
> lines),
> > I want to maintain the changes made to it.  If there was a way to use
> > CouchDB's rev fields for this, my problem would be solved, but since
> > CouchDB deletes non-current revs upon compaction, that is not an option.
> >
> > So instead, I want to create a "history_log" document, where I can just
> > store the delta between documents (as a patch, for example).
> >
> > In order to do this, I need to have my existing document, my new
> document,
> > compare the changes and write them to a history_log document.  But I
> don't
> > see if/where I can do that within and update handler.
> >
> > Is there something that can help me do this easily within CouchDB?  Are
> > there patch or json compare functions I can have access to from within a
> > CouchDB handler?
> >
> > Thanks,
> >
> > Eric
>

Re: How to store the delta between doc revisions?

Posted by Alexander Shorin <kx...@gmail.com>.
Storing patches is good until you're in sure that no single patch will
get suddenly deleted. Otherwise you could easily find all your history
broken. Oblivious, but it is the thing to remember when picking this
way of history management. Storing full document copies per revision
is more solid solution for such case: you can easily skip or lose one
or several revisions and be fine, but it also consumes much more disk
space. Trade offs are everywhere, pick up the one that suites you.
--
,,,^..^,,,


On Wed, Oct 1, 2014 at 10:02 PM, Eric B <eb...@gmail.com> wrote:
> I'm new to CouchDB and trying to figure out the best way to store a history
> of changes for a document.
>
> Originally, I was thinking the thing that makes the most sense is to use
> the update function of CouchDB but not entirely sure if I can.  Is there
> someway to use the update function and modify/create a second document in
> the process?
>
> For example, if I have a document which contains notes for a client.
> Everytime I modify the notes document (ie: add new lines or delete lines),
> I want to maintain the changes made to it.  If there was a way to use
> CouchDB's rev fields for this, my problem would be solved, but since
> CouchDB deletes non-current revs upon compaction, that is not an option.
>
> So instead, I want to create a "history_log" document, where I can just
> store the delta between documents (as a patch, for example).
>
> In order to do this, I need to have my existing document, my new document,
> compare the changes and write them to a history_log document.  But I don't
> see if/where I can do that within and update handler.
>
> Is there something that can help me do this easily within CouchDB?  Are
> there patch or json compare functions I can have access to from within a
> CouchDB handler?
>
> Thanks,
>
> Eric