You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Alexander Uvarov <al...@gmail.com> on 2010/05/16 21:33:49 UTC

Any better solution for my case?

Each user has its own database. Users can store documents. Documents has many predefined and custom parameters along with tags. Users should be able to create so called "collections" of their documents. By collection I mean a set of criterions, so particular collection should look like:

{
  "type": "Item",
  "color": "black",
  "condition": "mint",
  "pages_count", { "in": [1, 300] },
  "tags": ["cool", "awesome", "sweet"]
}

Currently I see the one and only solution -- just use couchdb-lucene and transform criterions to actual lucene query string.

Any better ideas?

Well, another one is to just patch CouchDB and add an ability to pass additional parameters to *generic map function* and store this param in design doc. Too brave actually, a kind of insanity. But I am going to start hacking if this would have a chance to be accepted into the mainline. It seemingly sounds selfishly, but I think that I am not the only one who have found this useful. Thoughts?

Re: Any better solution for my case?

Posted by Jarrod Roberson <ja...@vertigrated.com>.

On Sun, May 16, 2010 at 11:23 PM, Alexander Uvarov <
alexander.uvarov@gmail.com> wrote:

>
> On 17.05.2010, at 6:58, Jarrod Roberson wrote:
> >
> > you don't understand my approach, the list function doesn't apply any
> rules
> > it just merges the duplicate documents, it is exactly the same thing that
> a
> > RDBMS or say Lucene would do.
>
> Thanks, now it seems I got it. Sleep deprivation is not cool :(
>
> >
> > your View function would look something like this:
> > I don't know what pages_count might need to be queried on so I skipped
> it,
> > exercise for the reader :-) I am sure you could apply the same logic to
> > pages_count that I applied to tags.
>
> Not applicable. "pages_count": { "in": [1, 500]} is a range.
>
> emit(['pages_count',doc.pages_count],null);
>
> '{"keys":[['pages_count',1], ['pages_count',2], ['pages_count',3],...
> ['pages_count',500!!!]]}' is unacceptable. Unfortunately there is no range
> support in couchdb.
>
>
>  you could do a separate query against a pages_count view with ranges via
the startkey=1&endkey=500 and that would give range support, but you would
still have to merge those ids on the client.

Re: Any better solution for my case?

Posted by Alexander Uvarov <al...@gmail.com>.

On 17.05.2010, at 6:58, Jarrod Roberson wrote:
> 
> you don't understand my approach, the list function doesn't apply any rules
> it just merges the duplicate documents, it is exactly the same thing that a
> RDBMS or say Lucene would do.

Thanks, now it seems I got it. Sleep deprivation is not cool :(

> 
> your View function would look something like this:
> I don't know what pages_count might need to be queried on so I skipped it,
> exercise for the reader :-) I am sure you could apply the same logic to
> pages_count that I applied to tags.

Not applicable. "pages_count": { "in": [1, 500]} is a range.

emit(['pages_count',doc.pages_count],null);

'{"keys":[['pages_count',1], ['pages_count',2], ['pages_count',3],... ['pages_count',500!!!]]}' is unacceptable. Unfortunately there is no range support in couchdb.

The following is also unacceptable, turns into millions of keys, don't forget float values:

for (var i; i < pages_count; i++) {
  emit(...)
}

> 
> function(doc)
> {
>    emit(['type',doc.type],null);
>    emit(['color',doc.color],null);
>    emit(['condition',doc.condition],null);
>    for (var t in doc.tags)
>    {
>        emit(['tag',doc.tags[t]], null);
>    }
> }
> 
> of course you could do doctype checking if you have multiple types of
> documents and checking to see if each property actually exists but I didn't
> want to muddy the water with boilerplate code.
> then you use the generic merge list List function I wrote that reduces all
> the results down to one unique set of ids/documents.
> 
> something like this
> 
> curl -X POST
> http://localhost:5984/yourdatabase/_design/yourdesigndoc/_list/merge_search/search_criteria?include_docs=true-d
> '{"keys":[['tag','cool'],['tag','awesome'],['color','black'],['type','item'],['condition','mint']]}'
> 
> the one assumption I make is that I want the docs back on the queries, it
> would be easy enough to change the list function to optionally process the
> docs if they don't exist and only return back a unique set of keys.
> This is a generic way to search without having to have custom views for each
> field you want to search.

> Since you have a single user per database, you could just create permanent
> views on demand, it would take time to build the indexes based on how big
> each database was and it could bload the database with un-neccesarry
> duplication

This is what I am considering as the best solution. The only thing I don't like is javascript code generation for a map function.
Ability to pass extra parameters from design doc to map function is much more elegant.

> , but this generic fashion avoids having all that duplicate data
> where they just have say a single tag difference in the "criteria".

Storage is not a resource.

> 
> you could even make the index marginally smaller by just using single
> character names for the field names if there were "lots" of documents to
> index t instead of tags, c1 instead of color, c2 instead of condition.

Re: Any better solution for my case?

Posted by Jarrod Roberson <ja...@vertigrated.com>.

> > How so I have tested it with 100k's of documents and didnt see any
> > performance problems
>
> With this approach operation turns into:
>
> 1. GET criteria document,
> 2. Pass it to the list function,
> 3. List function will filter each key by applying rules, sound like
> temp-view for each request
>
> Instead of just querying a view.
>
> Simply I don't need "ad hoc" query, this is not my case. I need "ad hoc"
> views. All criterions are known at the moment of indexation.
> It's much easier to just generate code for map function, but not so
> elegant.


you don't understand my approach, the list function doesn't apply any rules
it just merges the duplicate documents, it is exactly the same thing that a
RDBMS or say Lucene would do.

your View function would look something like this:
I don't know what pages_count might need to be queried on so I skipped it,
exercise for the reader :-) I am sure you could apply the same logic to
pages_count that I applied to tags.

function(doc)
{
    emit(['type',doc.type],null);
    emit(['color',doc.color],null);
    emit(['condition',doc.condition],null);
    for (var t in doc.tags)
    {
        emit(['tag',doc.tags[t]], null);
    }
}

of course you could do doctype checking if you have multiple types of
documents and checking to see if each property actually exists but I didn't
want to muddy the water with boilerplate code.
then you use the generic merge list List function I wrote that reduces all
the results down to one unique set of ids/documents.

something like this

curl -X POST
http://localhost:5984/yourdatabase/_design/yourdesigndoc/_list/merge_search/search_criteria?include_docs=true-d
'{"keys":[['tag','cool'],['tag','awesome'],['color','black'],['type','item'],['condition','mint']]}'

the one assumption I make is that I want the docs back on the queries, it
would be easy enough to change the list function to optionally process the
docs if they don't exist and only return back a unique set of keys.
This is a generic way to search without having to have custom views for each
field you want to search.
Since you have a single user per database, you could just create permanent
views on demand, it would take time to build the indexes based on how big
each database was and it could bload the database with un-neccesarry
duplication, but this generic fashion avoids having all that duplicate data
where they just have say a single tag difference in the "criteria".

you could even make the index marginally smaller by just using single
character names for the field names if there were "lots" of documents to
index t instead of tags, c1 instead of color, c2 instead of condition.

Re: Any better solution for my case?

Posted by Alexander Uvarov <al...@gmail.com>.

On 17.05.2010, at 4:42, Jarrod Roberson wrote:
>>> 
>>> You can try the approach I used here to do "ad hoc" queries using
>>> what I
>>> call a "pivot" index.
>>> you could easily transform that into a multi-key query and use the
>>> same
>>> technique I used.
>>> 
>>> http://www.vertigrated.com/blog/2010/04/generic-ad-hoc-queries-in-couchdb/
>>> 
>> 
>> 
>> I am afraid that this approach can significantly break my server's
>> performance.
>> 
> 
> How so I have tested it with 100k's of documents and did see any
> performance problems

With this approach operation turns into:

1. GET criteria document,
2. Pass it to the list function,
3. List function will filter each key by applying rules, sound like temp-view for each request

Instead of just querying a view.

Simply I don't need "ad hoc" query, this is not my case. I need "ad hoc" views. All criterions are known at the moment of indexation.
It's much easier to just generate code for map function, but not so elegant.

Re: Any better solution for my case?

Posted by Jarrod Roberson <ja...@vertigrated.com>.

Jarrod Roberson
678-551-2852

On May 16, 2010, at 6:30 PM, Alexander Uvarov <alexander.uvarov@gmail.com
 > wrote:

>
> On 17.05.2010, at 2:12, Jarrod Roberson wrote:
>
>> On Sun, May 16, 2010 at 3:33 PM, Alexander Uvarov <
>> alexander.uvarov@gmail.com> wrote:
>>
>>> Currently I see the one and only solution -- just use couchdb-
>>> lucene and
>>> transform criterions to actual lucene query string.
>>>
>>> Any better ideas?
>>>
>>
>> You can try the approach I used here to do "ad hoc" queries using
>> what I
>> call a "pivot" index.
>> you could easily transform that into a multi-key query and use the
>> same
>> technique I used.
>>
>> http://www.vertigrated.com/blog/2010/04/generic-ad-hoc-queries-in-couchdb/
>>
>
>
> I am afraid that this approach can significantly break my server's
> performance.
>

How so I have tested it with 100k's of documents and did see any
performance problems

Re: Any better solution for my case?

Posted by Alexander Uvarov <al...@gmail.com>.

On 17.05.2010, at 2:12, Jarrod Roberson wrote:

> On Sun, May 16, 2010 at 3:33 PM, Alexander Uvarov <
> alexander.uvarov@gmail.com> wrote:
> 
>> Currently I see the one and only solution -- just use couchdb-lucene and
>> transform criterions to actual lucene query string.
>> 
>> Any better ideas?
>> 
> 
> You can try the approach I used here to do "ad hoc" queries using what I
> call a "pivot" index.
> you could easily transform that into a multi-key query and use the same
> technique I used.
> 
> http://www.vertigrated.com/blog/2010/04/generic-ad-hoc-queries-in-couchdb/
> 


I am afraid that this approach can significantly break my server's performance.

Re: Any better solution for my case?

Posted by Jarrod Roberson <ja...@vertigrated.com>.

On Sun, May 16, 2010 at 3:33 PM, Alexander Uvarov <
alexander.uvarov@gmail.com> wrote:

> Each user has its own database. Users can store documents. Documents has
> many predefined and custom parameters along with tags. Users should be able
> to create so called "collections" of their documents. By collection I mean a
> set of criterions, so particular collection should look like:
>
> {
>  "type": "Item",
>  "color": "black",
>  "condition": "mint",
>  "pages_count", { "in": [1, 300] },
>  "tags": ["cool", "awesome", "sweet"]
> }
>
> Currently I see the one and only solution -- just use couchdb-lucene and
> transform criterions to actual lucene query string.
>
> Any better ideas?
>

You can try the approach I used here to do "ad hoc" queries using what I
call a "pivot" index.
you could easily transform that into a multi-key query and use the same
technique I used.

http://www.vertigrated.com/blog/2010/04/generic-ad-hoc-queries-in-couchdb/


-- 
Jarrod Roberson
www.vertigrated.com/blog/