You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Markus Wolff <ma...@wolff-hamburg.de> on 2009/11/07 19:25:33 UTC

Brain check on CouchDB views

Hi List,

just to confirm that my brain is not completely dead, I'd like to check
with the experts if I've gotten the concepts right.

Say, I want to build an application that has users and tags. Very
original, I know, nobody ever did that before :-P

Now I need the following functionality:

- Display a global tag cloud with weighted tags of all users
- Display a tag cloud for each user with just the weighted tags
  of that user
- Display a list of documents by user "A", having tag "X"
- Display a list of documents by any user, having tag "X"
- Display a list of documents by any user, having tag "B" AND "Y"!

>>From what I understand, I can achieve almost all of these
functionalities with just two views.

Given that a document looks like this:

{ _id: "some_id", user: "A", tags: ["X","Y","Z"] }

...for the main "by_user" view, my map function would look something
like this:

function(doc) {
  if (doc.tags && doc.tags.length > 0) {
    doc.tags.forEach(function(tag) {
      emit([doc.user,tag], 1);
    });
  }
}

...and my reduce function something like this:

function (keys, values) {
  return sum(values);
}

At first, I experimented a fair bit with fancy rereduce magic, never
getting the results I wanted, until I discovered that I could achieve
nearly all I needed just by applying smarter query parameters:

- Display a tag cloud for each user with just the weighted tags
  of that user

  by_user?group=true&startkey=["A"]&endkey=["A",{}]
  
- Display a list of documents by user "A", having tag "X"

  by_user?reduce=false&startkey=["A", "X"]&endkey=["A","X"]

Now, for the functionalities applying to all users instead of individual
ones, I would have to make a second view (let's call that
"global_tags"). The reduce function could be the same as in the
"by_user" view, just the map function would be very slightly different
(emit only the tag as key, ignore the user):

function(doc) {
  if (doc.tags && doc.tags.length > 0) {
    doc.tags.forEach(function(tag) {
      emit(tag, 1);
    });
  }
}
 
Now I could resolve most of the remaining functionality like this:
 
- Display a global tag cloud with weighted tags of all users

  global_tags?group=true

- Display a list of documents by any user, having tag "X"

  global_tags?reduce=false&startkey="X"&endkey="X"  

The only unresolved feature remaining being querying for tags with
boolean conditions:

- Display a list of documents by any user, having tag "B" AND "Y"!

>>From what I can see, this cannot be resolved with a simple CouchDB view
(except if the condition was OR and the tags were directly adjacent in
the sorting order, but that should be a rare coincidence). Instead, I'd
have to use an external search index like Lucene and perform my search
queries on that instead.

Does all this sound about right? Are there even simpler ways to go about
the by_user/global_tags features, like using only one view instead of
two separate ones (I've tried, but didn't succeed)?

Are there ways to use boolean queries without an external search index,
that I just can't find?

Thanks for your time!

CU
 Markus


Re: no distributing couchdb?

Posted by Brian Candler <B....@pobox.com>.
On Sat, Nov 14, 2009 at 11:50:01AM -0800, James Leek wrote:
> view computation is not distributed like  
> google's MapReduce is.  Is this correct?

Basically - although I believe there are projects which can combine multiple
database "shards" into a single view.

But the big difference with CouchDB's Map/Reduce is that it is persistent
and incrementally updated. That is, if you import 10 million documents then
the first view computation will be slow. However, after that you'll get
instant answers - and if you add or update 1,000 documents then only those
documents are mapped again.

Re: no distributing couchdb?

Posted by Adam Kocoloski <ko...@apache.org>.
On Nov 14, 2009, at 2:50 PM, James Leek wrote:

> Hi, I'm still a little new to couchdb.  I still trying to understand it's limits and abilities, and I haven't been able to find anything that definitively answers this question for me.
> 
> It seems to me that couchdb is not really what one would call a distributed database.  You can replicate a database to different nodes (computers) and sync them, but view computation is not distributed like google's MapReduce is.  Is this correct?
> 
> Thanks,
> Jim

Hi Jim, at the moment this is true.  However there is Meebo's CouchDB-Lounge[1] proxy system that will add this functionality in front of a cluster of standalone CouchDB servers, as well as ongoing work inside Cloudant[2], a portion of which will be open-sourced, to build a robust sharding system inside CouchDB itself.  Best,

Adam

[1]: http://tilgovi.github.com/couchdb-lounge/
[2]: http://cloudant.com/


Re: no distributing couchdb?

Posted by Oliver Boermans <bo...@gmail.com>.
Hi Jim,
The concept you may be missing is commonly called "Eventual
Consistency" this CouchDB book has a dedicated chapter:
http://books.couchdb.org/relax/intro/eventual-consistency

Ollie
@ollicle

2009/11/15 James Leek <le...@llnl.gov>:
> It seems to me that couchdb is not really what one would call a distributed
> database.  You can replicate a database to different nodes (computers) and
> sync them, but view computation is not distributed like google's MapReduce
> is.  Is this correct?

no distributing couchdb?

Posted by James Leek <le...@llnl.gov>.
Hi, I'm still a little new to couchdb.  I still trying to understand 
it's limits and abilities, and I haven't been able to find anything that 
definitively answers this question for me.

It seems to me that couchdb is not really what one would call a 
distributed database.  You can replicate a database to different nodes 
(computers) and sync them, but view computation is not distributed like 
google's MapReduce is.  Is this correct?

Thanks,
Jim

RE: Brain check on CouchDB views

Posted by Nils Breunese <N....@vpro.nl>.
You can create a view which has the tag_count as a key or if you don't want to do that, but want to sort by value, you might want to take a look at couchdb-footrest: http://github.com/assembly/couchdb-footrest I haven't gotten couchdb-footrest to work yet, but it does promise sorting by value (probably there is a performance hit somewhere as values are not indexed).

Nils Breunese.

________________________________________
Van: Markus Wolff [markus@wolff-hamburg.de]
Verzonden: vrijdag 13 november 2009 0:19
Aan: user@couchdb.apache.org
Onderwerp: Re: Brain check on CouchDB views

Hi Smrchy,

since the manual states that the query option "descending" only refers
to the key order, and I haven't found anything about ordering by value,
I'd assume that you'd have to do this last step within your application.

In other words, there is no built-in CouchDB equivalent to the SQL
expression "ORDER BY tag_count DESC LIMIT 50".

Instead, you'd have to take the result for all tags sort it by value in
reverse order in your application, and then throw away everything but
the first 50 elements.

In PHP, for example, that would look like:

$reverseOrder = arsort($couchdbResultArray);
$top50 = array_slice($reverseOrder, 0, 50, true);

If there is a way to do it natively in CouchDB, I don't know it yet
(which doesn't mean much ;-)).

CU
  Markus


Smrchy wrote:
> Hi Markus,
> those simple steps you take are very interesting - i was also running into
> those big reduce problems. Though i admit i am still far away from
> understanding everything about reduce and rereduce.
>
> In you example how would you give back only the top 50 tags out of all your
> tags in your
>
> global_tags?group=true
>
> Is there an easy way for this?
>
> Patrick
>
> On Wed, Nov 8, 2006 at 5:41 PM, Markus Wolff <ma...@wolff-hamburg.de>wrote:
>
>> Paul Davis wrote:
>>
>>> Markus,
>>>
>>> Your thinking is spot on.
>>>
>> Wow, that's a first :-P
>>
>>
>>  The short answer is no. The long answer is probably not unless you
>>> know which tags you want in the OR beforehand when you build the view.
>>> The only other way is to use an external indexer.
>>>
>>> The reasoning is that boolean logic like this is gonna require
>>> multiple index traversals which CouchDB doesn't allow. As such, the
>>> best answer is either do the logic client side with some effort, or
>>> use an external indexer like couchdb-lucene.
>>>
>> Thanks for clearing that up, having confirmed that, I feel confident to
>> proceed in my endeavours ;-)
>>
>> CU
>>  Markus
>>
>


De informatie vervat in deze  e-mail en meegezonden bijlagen is uitsluitend bedoeld voor gebruik door de geadresseerde en kan vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan derden is voorbehouden aan geadresseerde. De VPRO staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mail, noch voor tijdige ontvangst daarvan.

Re: Brain check on CouchDB views

Posted by Markus Wolff <ma...@wolff-hamburg.de>.
Hi Smrchy,

since the manual states that the query option "descending" only refers 
to the key order, and I haven't found anything about ordering by value, 
I'd assume that you'd have to do this last step within your application.

In other words, there is no built-in CouchDB equivalent to the SQL 
expression "ORDER BY tag_count DESC LIMIT 50".

Instead, you'd have to take the result for all tags sort it by value in 
reverse order in your application, and then throw away everything but 
the first 50 elements.

In PHP, for example, that would look like:

$reverseOrder = arsort($couchdbResultArray);
$top50 = array_slice($reverseOrder, 0, 50, true);

If there is a way to do it natively in CouchDB, I don't know it yet 
(which doesn't mean much ;-)).

CU
  Markus


Smrchy wrote:
> Hi Markus,
> those simple steps you take are very interesting - i was also running into
> those big reduce problems. Though i admit i am still far away from
> understanding everything about reduce and rereduce.
> 
> In you example how would you give back only the top 50 tags out of all your
> tags in your
> 
> global_tags?group=true
> 
> Is there an easy way for this?
> 
> Patrick
> 
> On Wed, Nov 8, 2006 at 5:41 PM, Markus Wolff <ma...@wolff-hamburg.de>wrote:
> 
>> Paul Davis wrote:
>>
>>> Markus,
>>>
>>> Your thinking is spot on.
>>>
>> Wow, that's a first :-P
>>
>>
>>  The short answer is no. The long answer is probably not unless you
>>> know which tags you want in the OR beforehand when you build the view.
>>> The only other way is to use an external indexer.
>>>
>>> The reasoning is that boolean logic like this is gonna require
>>> multiple index traversals which CouchDB doesn't allow. As such, the
>>> best answer is either do the logic client side with some effort, or
>>> use an external indexer like couchdb-lucene.
>>>
>> Thanks for clearing that up, having confirmed that, I feel confident to
>> proceed in my endeavours ;-)
>>
>> CU
>>  Markus
>>
> 


Re: Brain check on CouchDB views

Posted by Smrchy <sm...@gmail.com>.
Hi Markus,
those simple steps you take are very interesting - i was also running into
those big reduce problems. Though i admit i am still far away from
understanding everything about reduce and rereduce.

In you example how would you give back only the top 50 tags out of all your
tags in your

global_tags?group=true

Is there an easy way for this?

Patrick

On Wed, Nov 8, 2006 at 5:41 PM, Markus Wolff <ma...@wolff-hamburg.de>wrote:

> Paul Davis wrote:
>
>> Markus,
>>
>> Your thinking is spot on.
>>
>
> Wow, that's a first :-P
>
>
>  The short answer is no. The long answer is probably not unless you
>> know which tags you want in the OR beforehand when you build the view.
>> The only other way is to use an external indexer.
>>
>> The reasoning is that boolean logic like this is gonna require
>> multiple index traversals which CouchDB doesn't allow. As such, the
>> best answer is either do the logic client side with some effort, or
>> use an external indexer like couchdb-lucene.
>>
>
> Thanks for clearing that up, having confirmed that, I feel confident to
> proceed in my endeavours ;-)
>
> CU
>  Markus
>

Re: Brain check on CouchDB views

Posted by Paul Davis <pa...@gmail.com>.
Markus,

Your thinking is spot on.

> - Display a list of documents by any user, having tag "B" AND "Y"!
>
> >From what I can see, this cannot be resolved with a simple CouchDB view
> (except if the condition was OR and the tags were directly adjacent in
> the sorting order, but that should be a rare coincidence). Instead, I'd
> have to use an external search index like Lucene and perform my search
> queries on that instead.
>
> Does all this sound about right? Are there even simpler ways to go about
> the by_user/global_tags features, like using only one view instead of
> two separate ones (I've tried, but didn't succeed)?

The short answer is no. The long answer is probably not unless you
know which tags you want in the OR beforehand when you build the view.
The only other way is to use an external indexer.

The reasoning is that boolean logic like this is gonna require
multiple index traversals which CouchDB doesn't allow. As such, the
best answer is either do the logic client side with some effort, or
use an external indexer like couchdb-lucene.

HTH,
Paul Davis