You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Mark Hahn <ma...@hahnca.com> on 2012/09/24 23:16:25 UTC

following fast doc updates

If I update a particular doc multiple times rapidly, is each update
guaranteed to show up in a continuous changes feed?  I am worried that the
change feed will be optimized to just show the latest value of a doc with
multiple updates.  This would break my logic.

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

You're right, the disk activity would probably be unchanged.  But it would
still require keeping the list of version numbers for every doc in memory
like my current scheme.

On Tue, Sep 25, 2012 at 7:02 PM, Jens Alfke <je...@couchbase.com> wrote:

>
> On Sep 25, 2012, at 5:59 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
> > I am already using a scheme that requires me to store the latest revision
> > number for every doc in memory.  You scheme would also require that and
> it
> > would cause extra reads.  Correct me if I'm wrong.
>
> Sure, it requires extra reads if the _changes feed skipped a revision, but
> it seems like the skipped revision and the read cancel out, so it’s the
> same amount of work as if you’d gotten every revision.
>
> Is any of this worse than rewriting your app? :)
>
> —Jens

Re: following fast doc updates

Posted by Jens Alfke <je...@couchbase.com>.

On Sep 25, 2012, at 5:59 PM, Mark Hahn <ma...@hahnca.com> wrote:

> I am already using a scheme that requires me to store the latest revision
> number for every doc in memory.  You scheme would also require that and it
> would cause extra reads.  Correct me if I'm wrong.

Sure, it requires extra reads if the _changes feed skipped a revision, but it seems like the skipped revision and the read cancel out, so it’s the same amount of work as if you’d gotten every revision.

Is any of this worse than rewriting your app? :)

—Jens

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

Thanks for the response.

> you can fetch that revision with its history,

I am already using a scheme that requires me to store the latest revision
number for every doc in memory.  You scheme would also require that and it
would cause extra reads.  Correct me if I'm wrong.

On Tue, Sep 25, 2012 at 5:46 PM, Jens Alfke <je...@couchbase.com> wrote:

>
> On Sep 25, 2012, at 11:31 AM, Mark Hahn <mark@hahnca.com<mailto:
> mark@hahnca.com>> wrote:
>
> The _changes feed only ever shows leaf revisions
>
> AARRGGHH.  I am so screwed.  I have been working on a scheme that relies on
> tracking every change.  And as everyone knows there is normally no way to
> find out what changed in a doc.  I am going to have to add a history of
> changes to each doc which it not only wasteful, but a pain to implement.
>
> Seems like this should be doable. When the _changes feed says a doc has
> changed, you can fetch that revision with its history, then look at the
> history to see whether there are any intermediate revisions after the last
> one you knew about. If there are, you can fetch those too.
>
> —Jens
>

Re: following fast doc updates

Posted by Jens Alfke <je...@couchbase.com>.

On Sep 25, 2012, at 11:31 AM, Mark Hahn <ma...@hahnca.com>> wrote:

The _changes feed only ever shows leaf revisions

AARRGGHH.  I am so screwed.  I have been working on a scheme that relies on
tracking every change.  And as everyone knows there is normally no way to
find out what changed in a doc.  I am going to have to add a history of
changes to each doc which it not only wasteful, but a pain to implement.

Seems like this should be doable. When the _changes feed says a doc has changed, you can fetch that revision with its history, then look at the history to see whether there are any intermediate revisions after the last one you knew about. If there are, you can fetch those too.

—Jens

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

>> I have been working on a scheme that relies on tracking every change.

>To do what?

I have a sort of pub-sub.  The client (through a wbsocket) can ask to be
notified on changes.  The subscription can be for just part of a doc.  For
example a product doc may be followed to watch for just price changes.  The
same doc may be followed by a different subscription to watch for product
spec changes.  (made up example).

Before now I've just been publishing the record to everyone on any doc
change.  This causes a lot of unnecessary traffic and what's worse is that
client code does a lot of work updating the DOM and other things way more
often than necessary, causing ugly things to happen.

> Why not storing a change as a  new doc?

This would change the _id which would break whatever linking there is
between records.

On Tue, Sep 25, 2012 at 11:46 PM, Benoit Chesneau <bc...@gmail.com>wrote:

> On Sep 25, 2012 8:32 PM, "Mark Hahn" <ma...@hahnca.com> wrote:
> >
> > > The _changes feed only ever shows leaf revisions
> >
> > AARRGGHH.  I am so screwed.  I have been working on a scheme that relies
> on
> > tracking every change.
> To do what?
>
> And as everyone knows there is normally no way to
> > find out what changed in a doc.  I am going to have to add a history of>
> changes to each doc which it not only wasteful, but a pain to implement.
>
> Why not storing a change as a  new doc?
> >
> > Thanks for taking the trouble to give me bad news.
> >
> >
> > On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org
> >wrote:
> >
> > > On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
> > >
> > > > If I update a particular doc multiple times rapidly, is each update
> > > > guaranteed to show up in a continuous changes feed?  I am worried
> that
> > > the
> > > > change feed will be optimized to just show the latest value of a doc
> with
> > > > multiple updates.  This would break my logic.
> > >
> > > Your worries are justified.  The _changes feed only ever shows leaf
> > > revisions (i.e., latest updates to branches of the edit tree).
>  Regards,
> > >
> > > Adam
>

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

> A common approach is to save the current document as an attachment when
you make an update

I am essentially doing this in my new scheme.  But actually I only store a
named flag that tells what portion of the doc changed along with a version
stamp.  I don't need to know the old values.  But I have to keep a record
in ram of the latest version of every doc, otherwise I have no idea what
changes have already been processed.

I could of course remove the flag when the change is processed, which would
remove the need for the ram table, but that would cause the double writing
of every doc.  A lot of schemes I considered caused double writing.

Saving a separate doc for each change would also double the number of
writes.

On Wed, Sep 26, 2012 at 3:01 AM, Robert Newson <rn...@apache.org> wrote:

> "then look at the history to see whether there are any intermediate
> revisions after the last one you knew about. If there are, you can
> fetch those too."
>
> Unless you compacted in between, in which case that data won't be there.
>
> The fundamental mistake being made here is believing that CouchDB even
> attempts to preserve a full history of your changes. It doesn't, and
> you will encounter problems if you think it does. And, to be
> absolutely clear, Jens' idea will not work unless you are prepared to
> be very careful about compaction scheduling (which I consider one of
> the largest CouchDB anti-patterns). An application should not break if
> someone compacts at the "wrong" time.
>
> CouchDB preserves only the current version of your data. Therefore,
> ensure the current version of your data includes *all* your data. In
> your case, your data is not just the current revision of documents,
> but also the history of changes. Make that a first class part of your
> application. Benoit's suggestion seems easiest, save the change as a
> document itself (Use a view to collate all the changes). You can also
> do it within the document too. A common approach is to save the
> current document as an attachment when you make an update
> (jquery.couch.js can already do this,
> https://friendpaste.com/inEzmxy0R933i0N4kyicj).
>
> B.
>
> On 26 September 2012 07:46, Benoit Chesneau <bc...@gmail.com> wrote:
> > On Sep 25, 2012 8:32 PM, "Mark Hahn" <ma...@hahnca.com> wrote:
> >>
> >> > The _changes feed only ever shows leaf revisions
> >>
> >> AARRGGHH.  I am so screwed.  I have been working on a scheme that relies
> > on
> >> tracking every change.
> > To do what?
> >
> > And as everyone knows there is normally no way to
> >> find out what changed in a doc.  I am going to have to add a history of
> >> changes to each doc which it not only wasteful, but a pain to implement.
> >
> > Why not storing a change as a  new doc?
> >>
> >> Thanks for taking the trouble to give me bad news.
> >>
> >>
> >> On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org
> >>wrote:
> >>
> >> > On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >> >
> >> > > If I update a particular doc multiple times rapidly, is each update
> >> > > guaranteed to show up in a continuous changes feed?  I am worried
> that
> >> > the
> >> > > change feed will be optimized to just show the latest value of a doc
> > with
> >> > > multiple updates.  This would break my logic.
> >> >
> >> > Your worries are justified.  The _changes feed only ever shows leaf
> >> > revisions (i.e., latest updates to branches of the edit tree).
>  Regards,
> >> >
> >> > Adam
>

Re: following fast doc updates

Posted by Robert Newson <rn...@apache.org>.

"then look at the history to see whether there are any intermediate
revisions after the last one you knew about. If there are, you can
fetch those too."

Unless you compacted in between, in which case that data won't be there.

The fundamental mistake being made here is believing that CouchDB even
attempts to preserve a full history of your changes. It doesn't, and
you will encounter problems if you think it does. And, to be
absolutely clear, Jens' idea will not work unless you are prepared to
be very careful about compaction scheduling (which I consider one of
the largest CouchDB anti-patterns). An application should not break if
someone compacts at the "wrong" time.

CouchDB preserves only the current version of your data. Therefore,
ensure the current version of your data includes *all* your data. In
your case, your data is not just the current revision of documents,
but also the history of changes. Make that a first class part of your
application. Benoit's suggestion seems easiest, save the change as a
document itself (Use a view to collate all the changes). You can also
do it within the document too. A common approach is to save the
current document as an attachment when you make an update
(jquery.couch.js can already do this,
https://friendpaste.com/inEzmxy0R933i0N4kyicj).

B.

On 26 September 2012 07:46, Benoit Chesneau <bc...@gmail.com> wrote:
> On Sep 25, 2012 8:32 PM, "Mark Hahn" <ma...@hahnca.com> wrote:
>>
>> > The _changes feed only ever shows leaf revisions
>>
>> AARRGGHH.  I am so screwed.  I have been working on a scheme that relies
> on
>> tracking every change.
> To do what?
>
> And as everyone knows there is normally no way to
>> find out what changed in a doc.  I am going to have to add a history of
>> changes to each doc which it not only wasteful, but a pain to implement.
>
> Why not storing a change as a  new doc?
>>
>> Thanks for taking the trouble to give me bad news.
>>
>>
>> On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org
>>wrote:
>>
>> > On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
>> >
>> > > If I update a particular doc multiple times rapidly, is each update
>> > > guaranteed to show up in a continuous changes feed?  I am worried that
>> > the
>> > > change feed will be optimized to just show the latest value of a doc
> with
>> > > multiple updates.  This would break my logic.
>> >
>> > Your worries are justified.  The _changes feed only ever shows leaf
>> > revisions (i.e., latest updates to branches of the edit tree).  Regards,
>> >
>> > Adam

Re: following fast doc updates

Posted by Benoit Chesneau <bc...@gmail.com>.

On Sep 25, 2012 8:32 PM, "Mark Hahn" <ma...@hahnca.com> wrote:
>
> > The _changes feed only ever shows leaf revisions
>
> AARRGGHH.  I am so screwed.  I have been working on a scheme that relies
on
> tracking every change.
To do what?

And as everyone knows there is normally no way to
> find out what changed in a doc.  I am going to have to add a history of
> changes to each doc which it not only wasteful, but a pain to implement.

Why not storing a change as a  new doc?
>
> Thanks for taking the trouble to give me bad news.
>
>
> On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org
>wrote:
>
> > On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >
> > > If I update a particular doc multiple times rapidly, is each update
> > > guaranteed to show up in a continuous changes feed?  I am worried that
> > the
> > > change feed will be optimized to just show the latest value of a doc
with
> > > multiple updates.  This would break my logic.
> >
> > Your worries are justified.  The _changes feed only ever shows leaf
> > revisions (i.e., latest updates to branches of the edit tree).  Regards,
> >
> > Adam

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

> Alternatively, you could use two dbs.

I came up with two are three schemes that require twice the number of
writes.  It bothers me to double the number of writes, although maybe it
shouldn't.

Right now I'm implementing a scheme where a hash is saved in ram for every
doc id and "change type" with the version number of that change.  A "change
type" table is also kept in the doc with the version number that the
changed happened in.  By comparing the two version numbers I can detect
each unique change, even when multiple have occured.

Boy this is a pain.  If somone can come up with a simpler scheme to catch
each unique change it would be appeciated.

On Tue, Sep 25, 2012 at 12:02 PM, Paul Davis <pa...@gmail.com>wrote:

> Alternatively, you could use two dbs. One db you could write "change
> requests" to (each request as a new doc) and then listen for changes
> on that and apply them in that logic. This also has that added benefit
> that you could do the timestamped dbname pattern for your changes feed
> dbs to (possibly depending on use case) remove some of the cruft
> buildup.
>
> On Tue, Sep 25, 2012 at 1:31 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >> The _changes feed only ever shows leaf revisions
> >
> > AARRGGHH.  I am so screwed.  I have been working on a scheme that relies
> on
> > tracking every change.  And as everyone knows there is normally no way to
> > find out what changed in a doc.  I am going to have to add a history of
> > changes to each doc which it not only wasteful, but a pain to implement.
> >
> > Thanks for taking the trouble to give me bad news.
> >
> >
> > On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <kocolosk@apache.org
> >wrote:
> >
> >> On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >>
> >> > If I update a particular doc multiple times rapidly, is each update
> >> > guaranteed to show up in a continuous changes feed?  I am worried that
> >> the
> >> > change feed will be optimized to just show the latest value of a doc
> with
> >> > multiple updates.  This would break my logic.
> >>
> >> Your worries are justified.  The _changes feed only ever shows leaf
> >> revisions (i.e., latest updates to branches of the edit tree).  Regards,
> >>
> >> Adam
>

Re: following fast doc updates

Posted by svilen <az...@svilendobrev.com>.

just building a log-of-changes may or may not be enough, depends on
your what docs represent.

there is some "revisioning" here:
http://blog.couchbase.com/simple-document-versioning-couchdb
basically keeping all revisions as attachments.

have fun

ciao
svil


On Tue, 25 Sep 2012 14:02:09 -0500
Paul Davis <pa...@gmail.com> wrote:

> Alternatively, you could use two dbs. One db you could write "change
> requests" to (each request as a new doc) and then listen for changes
> on that and apply them in that logic. This also has that added benefit
> that you could do the timestamped dbname pattern for your changes feed
> dbs to (possibly depending on use case) remove some of the cruft
> buildup.
> 
> On Tue, Sep 25, 2012 at 1:31 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >> The _changes feed only ever shows leaf revisions
> >
> > AARRGGHH.  I am so screwed.  I have been working on a scheme that
> > relies on tracking every change.  And as everyone knows there is
> > normally no way to find out what changed in a doc.  I am going to
> > have to add a history of changes to each doc which it not only
> > wasteful, but a pain to implement.
> >
> > Thanks for taking the trouble to give me bad news.
> >
> >
> > On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski
> > <ko...@apache.org>wrote:
> >
> >> On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >>
> >> > If I update a particular doc multiple times rapidly, is each
> >> > update guaranteed to show up in a continuous changes feed?  I am
> >> > worried that
> >> the
> >> > change feed will be optimized to just show the latest value of a
> >> > doc with multiple updates.  This would break my logic.
> >>
> >> Your worries are justified.  The _changes feed only ever shows leaf
> >> revisions (i.e., latest updates to branches of the edit tree).
> >> Regards,
> >>
> >> Adam

Re: following fast doc updates

Posted by Paul Davis <pa...@gmail.com>.

Alternatively, you could use two dbs. One db you could write "change
requests" to (each request as a new doc) and then listen for changes
on that and apply them in that logic. This also has that added benefit
that you could do the timestamped dbname pattern for your changes feed
dbs to (possibly depending on use case) remove some of the cruft
buildup.

On Tue, Sep 25, 2012 at 1:31 PM, Mark Hahn <ma...@hahnca.com> wrote:
>> The _changes feed only ever shows leaf revisions
>
> AARRGGHH.  I am so screwed.  I have been working on a scheme that relies on
> tracking every change.  And as everyone knows there is normally no way to
> find out what changed in a doc.  I am going to have to add a history of
> changes to each doc which it not only wasteful, but a pain to implement.
>
> Thanks for taking the trouble to give me bad news.
>
>
> On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <ko...@apache.org>wrote:
>
>> On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
>>
>> > If I update a particular doc multiple times rapidly, is each update
>> > guaranteed to show up in a continuous changes feed?  I am worried that
>> the
>> > change feed will be optimized to just show the latest value of a doc with
>> > multiple updates.  This would break my logic.
>>
>> Your worries are justified.  The _changes feed only ever shows leaf
>> revisions (i.e., latest updates to branches of the edit tree).  Regards,
>>
>> Adam

Re: following fast doc updates

Posted by Mark Hahn <ma...@hahnca.com>.

> The _changes feed only ever shows leaf revisions

AARRGGHH.  I am so screwed.  I have been working on a scheme that relies on
tracking every change.  And as everyone knows there is normally no way to
find out what changed in a doc.  I am going to have to add a history of
changes to each doc which it not only wasteful, but a pain to implement.

Thanks for taking the trouble to give me bad news.

On Tue, Sep 25, 2012 at 10:19 AM, Adam Kocoloski <ko...@apache.org>wrote:

> On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
> > If I update a particular doc multiple times rapidly, is each update
> > guaranteed to show up in a continuous changes feed?  I am worried that
> the
> > change feed will be optimized to just show the latest value of a doc with
> > multiple updates.  This would break my logic.
>
> Your worries are justified.  The _changes feed only ever shows leaf
> revisions (i.e., latest updates to branches of the edit tree).  Regards,
>
> Adam

Re: following fast doc updates

Posted by Adam Kocoloski <ko...@apache.org>.

On Sep 24, 2012, at 5:16 PM, Mark Hahn <ma...@hahnca.com> wrote:

> If I update a particular doc multiple times rapidly, is each update
> guaranteed to show up in a continuous changes feed?  I am worried that the
> change feed will be optimized to just show the latest value of a doc with
> multiple updates.  This would break my logic.

Your worries are justified.  The _changes feed only ever shows leaf revisions (i.e., latest updates to branches of the edit tree).  Regards,

Adam