You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nathan Vander Wilt <na...@calftrail.com> on 2013/12/06 00:14:58 UTC

Is startkey_docid as scalable as startkey?

Let's say for every doc I `emit([doc.user])` and, when a user requests a document ID I have my middleware `GET …/docs_by_user?startkey=[req.user.name]&endkey=[req.user.name,{}]&include_docs=true&limit=1&startkey_docid=req.param.id`. I return the row's doc or 404 if the range is empty. Basically I'm giving each user read access to "their own" objects without having to give them their own database.

I'm wondering though, if `startkey_docid` is as scalable as `startkey` itself. IIRC, the doc ids are simply a final extra group level internally (clearly they determine sort order) but if this behaves more like `&skip=lots` instead, then of course relying heavily on the query above would be something of an anti-pattern.

(Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't simplify my emit/query to use `&key=name&startkey_docid=id` right? Alternatively, would it be more efficient but just-as-correct to emit plain string keys and limit my range to `&startkey=name&endkey=name+"\0"?)

thanks,
-natevw

Re: Is startkey_docid as scalable as startkey?

Posted by Nathan Vander Wilt <na...@calftrail.com>.
Excellent, thanks! Third times a charm, or…should answer your followup question anyway ;-)
-nvw


On Dec 5, 2013, at 3:29 PM, Robert Newson <rn...@apache.org> wrote:

> Well, that'll teach me to multi-task and skim emails...
> 
> startkey_docid is the same 'scalability' as startkey, in the sense
> that startkey and startkey+startkey_docid are O(log n) lookups.
> 
> key=name&startkey_docid=id ought to work as key=foo is, internally,
> startkey=foo&endkey=foo (possibly verbatim).
> 
> To get back to your use case, I'm assuming doc.user is not unique but,
> somehow, you know the doc id of the user you're looking for? If so,
> why not just use _all_docs?key=req.param.id and don't build the view
> at all?
> 
> 
> 
> 
> On 5 December 2013 23:23, Robert Newson <rn...@apache.org> wrote:
>> To be clearer, startkey_docid is *ignored* unless you also specify startkey.
>> 
>> B.
>> 
>> 
>> On 5 December 2013 23:23, Robert Newson <rn...@apache.org> wrote:
>>> The question is meaningless, let me explain.
>>> 
>>> startkey_docid (and endkey_docid) are used for selecting ranges where
>>> the view key is the same, it is *not* a separate index. Views are in
>>> key order only.
>>> 
>>> under the covers, the true view key is actually [emitted_key_order,
>>> doc._id], the rows are unique in the b+tree.
>>> 
>>> B.
>>> 
>>> 
>>> On 5 December 2013 23:14, Nathan Vander Wilt <na...@calftrail.com> wrote:
>>>> Let's say for every doc I `emit([doc.user])` and, when a user requests a document ID I have my middleware `GET …/docs_by_user?startkey=[req.user.name]&endkey=[req.user.name,{}]&include_docs=true&limit=1&startkey_docid=req.param.id`. I return the row's doc or 404 if the range is empty. Basically I'm giving each user read access to "their own" objects without having to give them their own database.
>>>> 
>>>> I'm wondering though, if `startkey_docid` is as scalable as `startkey` itself. IIRC, the doc ids are simply a final extra group level internally (clearly they determine sort order) but if this behaves more like `&skip=lots` instead, then of course relying heavily on the query above would be something of an anti-pattern.
>>>> 
>>>> (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't simplify my emit/query to use `&key=name&startkey_docid=id` right? Alternatively, would it be more efficient but just-as-correct to emit plain string keys and limit my range to `&startkey=name&endkey=name+"\0"?)
>>>> 
>>>> thanks,
>>>> -natevw


Re: Is startkey_docid as scalable as startkey?

Posted by Robert Newson <rn...@apache.org>.
Well, that'll teach me to multi-task and skim emails...

startkey_docid is the same 'scalability' as startkey, in the sense
that startkey and startkey+startkey_docid are O(log n) lookups.

key=name&startkey_docid=id ought to work as key=foo is, internally,
startkey=foo&endkey=foo (possibly verbatim).

To get back to your use case, I'm assuming doc.user is not unique but,
somehow, you know the doc id of the user you're looking for? If so,
why not just use _all_docs?key=req.param.id and don't build the view
at all?




On 5 December 2013 23:23, Robert Newson <rn...@apache.org> wrote:
> To be clearer, startkey_docid is *ignored* unless you also specify startkey.
>
> B.
>
>
> On 5 December 2013 23:23, Robert Newson <rn...@apache.org> wrote:
>> The question is meaningless, let me explain.
>>
>> startkey_docid (and endkey_docid) are used for selecting ranges where
>> the view key is the same, it is *not* a separate index. Views are in
>> key order only.
>>
>> under the covers, the true view key is actually [emitted_key_order,
>> doc._id], the rows are unique in the b+tree.
>>
>> B.
>>
>>
>> On 5 December 2013 23:14, Nathan Vander Wilt <na...@calftrail.com> wrote:
>>> Let's say for every doc I `emit([doc.user])` and, when a user requests a document ID I have my middleware `GET …/docs_by_user?startkey=[req.user.name]&endkey=[req.user.name,{}]&include_docs=true&limit=1&startkey_docid=req.param.id`. I return the row's doc or 404 if the range is empty. Basically I'm giving each user read access to "their own" objects without having to give them their own database.
>>>
>>> I'm wondering though, if `startkey_docid` is as scalable as `startkey` itself. IIRC, the doc ids are simply a final extra group level internally (clearly they determine sort order) but if this behaves more like `&skip=lots` instead, then of course relying heavily on the query above would be something of an anti-pattern.
>>>
>>> (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't simplify my emit/query to use `&key=name&startkey_docid=id` right? Alternatively, would it be more efficient but just-as-correct to emit plain string keys and limit my range to `&startkey=name&endkey=name+"\0"?)
>>>
>>> thanks,
>>> -natevw

Re: Is startkey_docid as scalable as startkey?

Posted by Robert Newson <rn...@apache.org>.
To be clearer, startkey_docid is *ignored* unless you also specify startkey.

B.


On 5 December 2013 23:23, Robert Newson <rn...@apache.org> wrote:
> The question is meaningless, let me explain.
>
> startkey_docid (and endkey_docid) are used for selecting ranges where
> the view key is the same, it is *not* a separate index. Views are in
> key order only.
>
> under the covers, the true view key is actually [emitted_key_order,
> doc._id], the rows are unique in the b+tree.
>
> B.
>
>
> On 5 December 2013 23:14, Nathan Vander Wilt <na...@calftrail.com> wrote:
>> Let's say for every doc I `emit([doc.user])` and, when a user requests a document ID I have my middleware `GET …/docs_by_user?startkey=[req.user.name]&endkey=[req.user.name,{}]&include_docs=true&limit=1&startkey_docid=req.param.id`. I return the row's doc or 404 if the range is empty. Basically I'm giving each user read access to "their own" objects without having to give them their own database.
>>
>> I'm wondering though, if `startkey_docid` is as scalable as `startkey` itself. IIRC, the doc ids are simply a final extra group level internally (clearly they determine sort order) but if this behaves more like `&skip=lots` instead, then of course relying heavily on the query above would be something of an anti-pattern.
>>
>> (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't simplify my emit/query to use `&key=name&startkey_docid=id` right? Alternatively, would it be more efficient but just-as-correct to emit plain string keys and limit my range to `&startkey=name&endkey=name+"\0"?)
>>
>> thanks,
>> -natevw

Re: Is startkey_docid as scalable as startkey?

Posted by Robert Newson <rn...@apache.org>.
The question is meaningless, let me explain.

startkey_docid (and endkey_docid) are used for selecting ranges where
the view key is the same, it is *not* a separate index. Views are in
key order only.

under the covers, the true view key is actually [emitted_key_order,
doc._id], the rows are unique in the b+tree.

B.


On 5 December 2013 23:14, Nathan Vander Wilt <na...@calftrail.com> wrote:
> Let's say for every doc I `emit([doc.user])` and, when a user requests a document ID I have my middleware `GET …/docs_by_user?startkey=[req.user.name]&endkey=[req.user.name,{}]&include_docs=true&limit=1&startkey_docid=req.param.id`. I return the row's doc or 404 if the range is empty. Basically I'm giving each user read access to "their own" objects without having to give them their own database.
>
> I'm wondering though, if `startkey_docid` is as scalable as `startkey` itself. IIRC, the doc ids are simply a final extra group level internally (clearly they determine sort order) but if this behaves more like `&skip=lots` instead, then of course relying heavily on the query above would be something of an anti-pattern.
>
> (Bonuses: If this _is_ still a reasonable solution, I'm assuming I can't simplify my emit/query to use `&key=name&startkey_docid=id` right? Alternatively, would it be more efficient but just-as-correct to emit plain string keys and limit my range to `&startkey=name&endkey=name+"\0"?)
>
> thanks,
> -natevw