You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Heiko Schaefer <hs...@fto.de> on 2010/09/29 16:27:57 UTC

Efficient couchdb queries with explicitly specified keyset of >1000 IDs?

Hello Couchdb-User-List,

I'm building a little family of REST-services with couchdb over the past
months and have made a lot of progress. It's a great experience to get
to know and use couchdb. Right now I'm trying to figure out how to
implement a new requirement - and I feel a little lost.


Here's what I am considering, and would like to know if it sounds at all
feasible:

I expect to have on the order of 100.000 documents in my couchdb, very
soon. But an (independent and external) service may define groups of
documents (a few thousand documents being one group). I'd like to be
able do searches (query couchdb views) on just the subset of documents
that are in one group.
I'm not very keen to mark my documents themselves as members of a group,
if it can be at all avoided. So I thought maybe I can make queries that
explicitly list the relevant few thousand couch IDs in queries.


Example use-case:

There are 100.000 documents in the couchdb. An externally supplied list
specifies a subset of 5.000 IDs.
I'd like to efficiently query a view, but only considering the documents
within this subset of 5.000 IDs. Then I would like to further narrow
down these results by other keys.
[The IDs would probably not map 1:1 onto the couchdb "_id", but rather
onto another field (or two) of the documents]


Would that be as mad (and inefficient) as it sounds? Is there any other
way to achieve my objectives with couchdb that I might be missing?


Sorry if my question is not worded properly or clearly - I don't feel
fluent in the couchdb terminology and way of thinking yet.
Thanks in advance for any help and pointers.

cheers,
:) Heiko

Re: Efficient couchdb queries with explicitly specified keyset of >1000 IDs?

Posted by Simon Metson <si...@googlemail.com>.

Hi,
	I think you could use the "fetch related data" feature added in 0.11  
(see http://blog.couchone.com/post/446015664/whats-new-in-apache-couchdb-0-11-part-two-views 
  for a nice summary). You could have a doc defining your groups and  
then emit that and do the include_docs to pull in the documents in  
that group. I've not used that feature for 1000's of dependant docs,  
but I suspect it should work and not be too inefficient, depending on  
your document structure. If you want to narrow down by other keys  
they'd need to be in the group document (I think - I don't think you  
can further filter on the included docs).

The alternative is having a list of groups that the document belongs  
to in the document (which you don't want to do) and doing a view  
indexed by group.

If the documents are only in one group, and all queries are only  
interesting for the group en masse (e.g. you don't need to have views  
over the whole dataset, or those views are simple) you could have a  
database per group.
Cheers
Simon

On 29 Sep 2010, at 15:27, Heiko Schaefer wrote:

> Hello Couchdb-User-List,
>
> I'm building a little family of REST-services with couchdb over the  
> past
> months and have made a lot of progress. It's a great experience to get
> to know and use couchdb. Right now I'm trying to figure out how to
> implement a new requirement - and I feel a little lost.
>
>
> Here's what I am considering, and would like to know if it sounds at  
> all
> feasible:
>
> I expect to have on the order of 100.000 documents in my couchdb, very
> soon. But an (independent and external) service may define groups of
> documents (a few thousand documents being one group). I'd like to be
> able do searches (query couchdb views) on just the subset of documents
> that are in one group.
> I'm not very keen to mark my documents themselves as members of a  
> group,
> if it can be at all avoided. So I thought maybe I can make queries  
> that
> explicitly list the relevant few thousand couch IDs in queries.
>
>
> Example use-case:
>
> There are 100.000 documents in the couchdb. An externally supplied  
> list
> specifies a subset of 5.000 IDs.
> I'd like to efficiently query a view, but only considering the  
> documents
> within this subset of 5.000 IDs. Then I would like to further narrow
> down these results by other keys.
> [The IDs would probably not map 1:1 onto the couchdb "_id", but rather
> onto another field (or two) of the documents]
>
>
> Would that be as mad (and inefficient) as it sounds? Is there any  
> other
> way to achieve my objectives with couchdb that I might be missing?
>
>
> Sorry if my question is not worded properly or clearly - I don't feel
> fluent in the couchdb terminology and way of thinking yet.
> Thanks in advance for any help and pointers.
>
> cheers,
> :) Heiko

Re: Efficient couchdb queries with explicitly specified keyset of >1000 IDs?

Posted by Nils Breunese <N....@vpro.nl>.

Heiko Schaefer wrote:

> I expect to have on the order of 100.000 documents in my couchdb, very
> soon. But an (independent and external) service may define groups of
> documents (a few thousand documents being one group). I'd like to be
> able do searches (query couchdb views) on just the subset of documents
> that are in one group.
> I'm not very keen to mark my documents themselves as members of a group,
> if it can be at all avoided. So I thought maybe I can make queries that
> explicitly list the relevant few thousand couch IDs in queries.

Why are you not very keen on adding the groupings to the documents? That would solve your problems in a very relaxing way AFAIK.

Nils.
------------------------------------------------------------------------
 VPRO
 phone:  +31(0)356712911
 e-mail: info@vpro.nl
 web:    www.vpro.nl
------------------------------------------------------------------------