You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jens Alfke <je...@mooseyard.com> on 2010/02/26 17:27:13 UTC
Consistent version of a view across multiple queries?
If an app wants to iterate over a large view, it seems better to page the output by issuing multiple queries, using the startkey= and limit= parameters. However, this seems to introduce race conditions if another client is meanwhile altering the database. I might see half of the documents before the change and half after. For example, I might see a document show up twice with two different key values.
Is there any way to avoid this inconsistency? In a SQL database I'd use a transaction for this, to lock out any database updates in between my series of SELECTs. But CouchDB's architecture doesn't support that.
It seems like what I want is to specify some kind of clock (timestamp / revision #) in my view queries, so they all run over the exact same view b-tree. This seems straightforward at the level of the CouchDB file-format, since it's append-only and the previous view b-tree still exists in the file. But is this exposed in the API at all?
—Jens
Re: Consistent version of a view across multiple queries?
Posted by Chris Anderson <jc...@apache.org>.
On Sat, Feb 27, 2010 at 9:06 AM, Jens Alfke <je...@mooseyard.com> wrote:
>
> On Feb 26, 2010, at 10:56 AM, Chris Anderson wrote:
>
>> No, but it should be. I've been tijnkjng about this for a while.
>
> Cool :) My immediate idea is to return a _rev key in a view result, like a document, whose value changes each time the view is rebuilt. In a query you could optionally add something like "&rev=" to specify which revision to use.
>
We should definitely be discussing this on dev@
In a nutshell what we've discussed before is basing the view etag on
the last seq-id of the database which changed anything in the index.
We already track this at a view-group (design document) level but
don't expose it. To do it for a single view in a group, we'd have to
do some additional coding.
> Of course now you have to store a mapping from revision numbers to the location of the view's tree in the db file. A quick and dirty way to do this might be to optimize for only recently-obsoleted view results, and just chain the results in a linked list. So the internal data for each view b-tree would contain its _rev value, and the position in the file of the previous generation tree. [I don't know the details of the file format, though, so this might not make sense.]
>
>> Main complication is that the old seq might not be available if a view compaction completes in between queries.
>
> Yeah, eventually you always run into that :/ Maybe compaction could optionally preserve the last couple of generations of a view? Or just specific generations that have been actively used in queries in the last N minutes?
All this sounds doable at a cost of complexity. But we are getting
toward the time to think about "post-1.0" optimization etc, so it's
worth researching.
Chris
>
> —Jens
--
Chris Anderson
http://jchrisa.net
http://couch.io
Re: Consistent version of a view across multiple queries?
Posted by Jens Alfke <je...@mooseyard.com>.
On Feb 26, 2010, at 10:56 AM, Chris Anderson wrote:
> No, but it should be. I've been tijnkjng about this for a while.
Cool :) My immediate idea is to return a _rev key in a view result, like a document, whose value changes each time the view is rebuilt. In a query you could optionally add something like "&rev=" to specify which revision to use.
Of course now you have to store a mapping from revision numbers to the location of the view's tree in the db file. A quick and dirty way to do this might be to optimize for only recently-obsoleted view results, and just chain the results in a linked list. So the internal data for each view b-tree would contain its _rev value, and the position in the file of the previous generation tree. [I don't know the details of the file format, though, so this might not make sense.]
> Main complication is that the old seq might not be available if a view compaction completes in between queries.
Yeah, eventually you always run into that :/ Maybe compaction could optionally preserve the last couple of generations of a view? Or just specific generations that have been actively used in queries in the last N minutes?
—Jens
Re: Consistent version of a view across multiple queries?
Posted by Chris Anderson <jc...@gmail.com>.
On Feb 26, 2010, at 8:27 AM, Jens Alfke <je...@mooseyard.com> wrote:
> If an app wants to iterate over a large view, it seems better to
> page the output by issuing multiple queries, using the startkey= and
> limit= parameters. However, this seems to introduce race conditions
> if another client is meanwhile altering the database. I might see
> half of the documents before the change and half after. For example,
> I might see a document show up twice with two different key values.
>
> Is there any way to avoid this inconsistency? In a SQL database I'd
> use a transaction for this, to lock out any database updates in
> between my series of SELECTs. But CouchDB's architecture doesn't
> support that.
>
> It seems like what I want is to specify some kind of clock
> (timestamp / revision #) in my view queries, so they all run over
> the exact same view b-tree. This seems straightforward at the level
> of the CouchDB file-format, since it's append-only and the previous
> view b-tree still exists in the file. But is this exposed in the API
> at all?
No, but it should be. I've been tijnkjng about this for a while. Main
complication is that the old seq might not be available if a view
compaction completes in between queries.
Chris
>
> —Jens
Re: Consistent version of a view across multiple queries?
Posted by Robert Newson <ro...@gmail.com>.
You can query with stale=ok and the view won't change (as long as no
other call happens without stale=ok). You'll have to call without
stale=ok sometimes, though, so you'll still need to take care. Does
that help?
B.
On Fri, Feb 26, 2010 at 11:27 AM, Jens Alfke <je...@mooseyard.com> wrote:
> If an app wants to iterate over a large view, it seems better to page the output by issuing multiple queries, using the startkey= and limit= parameters. However, this seems to introduce race conditions if another client is meanwhile altering the database. I might see half of the documents before the change and half after. For example, I might see a document show up twice with two different key values.
>
> Is there any way to avoid this inconsistency? In a SQL database I'd use a transaction for this, to lock out any database updates in between my series of SELECTs. But CouchDB's architecture doesn't support that.
>
> It seems like what I want is to specify some kind of clock (timestamp / revision #) in my view queries, so they all run over the exact same view b-tree. This seems straightforward at the level of the CouchDB file-format, since it's append-only and the previous view b-tree still exists in the file. But is this exposed in the API at all?
>
> —Jens