You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nicolas Clairon <cl...@gmail.com> on 2009/02/12 11:42:25 UTC

playing with tags

Hi there !

I'm playing with tags these time and a question comes to me.
For exemple, I have a bunch of blog articles:

article1 = {
  ...snip...,
  tags : ["couchdb", "python", "best practices"],
}

article2 = {
    ...snip...,
    tags : ["python", "best practices"],
}

article3 = {
    ...snip...,
    tags : ["couchdb", "best practices"],
}

and a view wich emit tags:

function(doc){
    if (doc.type=="article"){
        for (var i in doc.tags){
            emit( doc.tags[i], doc )
        }
    }
}

We can get all articles wich are tagged with "couchdb" easily:

http://localhost:5984/blog/_view/article/by_tag?key="couchdb"

but now, I want all articles wich are tagged with "couchdb" *and* "python"
(I want the article1). Is there a method to do it directly with CouchDB views ?
Something like that :

http://localhost:5984/blog/_view/article/by_tag?key_in=["couchdb", "python"]

For the moment, I have to do it by program,  firing 2 views and merge
the results...

We can also think something like this:

http://localhost:5984/blog/_view/article/by_tag?onekey_in=["couchdb", "python"]

wich will get all articles which are tagged with "couchdb" *or* "python"...

Does it already exists in CouchDB ?

Re: playing with tags

Posted by Jeremy Wall <jw...@google.com>.
Why would you lose the ability to map through the data. Nothing prevents the
map from working with both kinds of document. What exactly do you see
yourself losing there?

Sent from my G1 google phone

On Feb 12, 2009 9:47 AM, "Dean Landolt" <de...@deanlandolt.com> wrote:

On Thu, Feb 12, 2009 at 5:42 AM, Nicolas Clairon <cl...@gmail.com> wrote:
> Hi there ! > > I'm pl...
This is an issue that has been nagging at me lately. Storing tags in the doc
seems like a recipe for disaster (that is, if you consider view contention
disaster). I would argue that tags (and other readily changeable
user-specific state information like read/unread, favorite, blah blah)
should be kept in separate docs and bridged together in views. Of course,
doing this means you lose the ability to map through this data (e.g. no tag
clouds), so it's a lose/lose right now. This is something I just can't
figure out how to get around without a map/reduce/map kind of paradigm.

Paul Davis mentioned some ideas about value indexes and view intersections
that may help here. Does anyone else have opinions on how best to design for
this? Someone floated the article *Accountants Don't Use Erasers* [1]
recently that further convinced me this kind of data should be external to
the doc in question, but I'm at a loss for how to do this without
sacrificing functionality.

[1]
http://blogs.msdn.com/pathelland/archive/2007/06/14/accountants-don-t-use-erasers.aspx

Re: playing with tags

Posted by Kerr Rainey <ke...@gmail.com>.
2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> Perhaps this does belong in a different thread, but the example just
> reminded me of the general design question -- AND semantics and view
> intersections by value are a pretty big part of it.

It would certainly be useful to be able to do this kind of thing, and
your correct that problems of linking together related docs, for what
ever reason, is the main reason you'd want it.  It's an interesting
problem.

--
Kerr

Re: playing with tags

Posted by Dean Landolt <de...@deanlandolt.com>.
On Thu, Feb 12, 2009 at 11:16 AM, Kerr Rainey <ke...@gmail.com> wrote:

> 2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> > Perhaps in this scenario, but take a feed reader, for instance. When a
> user
> > is tagging feeds, it is subject to change at a much different rate than
> the
> > otherwise static content. Even in a single user scenario, once
> replication
> > is introduced (say, to a local instance of the app), it's easy to imagine
> a
> > situation where things get messy. Then imagine read/unread metadata, and
> all
> > the others types of user data associated with this otherwise static
> content.
>
> Obviously you need to fit the solution to the problem.  Storing meta
> data in a doc is not inherently a problem.  Storing *any* data that
> has high contention on a single document is a problem, but that's just
> basic CouchDB in the same way that you wouldn't store the comments on
> a single blog post doc.  If your meta data is highly contended then
> you need to find another way round it.  But all this is a non sequitur
> from the problem of getting AND semantics for keys in view lookups.


Perhaps this does belong in a different thread, but the example just
reminded me of the general design question -- AND semantics and view
intersections by value are a pretty big part of it.

Take read/unread state. Obviously you don't want to store this in the doc --
each read/unread event is separate from the doc. So you could keep separate
docs like this:

{
   "_id": "4b3d17d0269e0fa23072abec8859f447",
   "_rev": "1211529288",
   "at": "2009-02-02T08:07:56Z",
   "class": "state",
   "entry": "http://example.com/1",
   "read": false
}

...or some such -- that's just an implementation detail. You can reduce
these state events down into one value (also taking into consideration, for
instance, when an update field on the doc is newer than the latest read
state).

If you just want to get all unread feeds, AND semantics would be pretty damn
helpful (yes, I know you could do a multi-key get and filter at the client).
But when you want to know which feeds are unread for a view it's unavailable
to you (that's what I meant by map/reduce/map). Paul's ideas about values
indexes are one possible way to get around this. I was just wondering
whether there was another approach I'm missing.

Re: playing with tags

Posted by Kerr Rainey <ke...@gmail.com>.
2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> Perhaps in this scenario, but take a feed reader, for instance. When a user
> is tagging feeds, it is subject to change at a much different rate than the
> otherwise static content. Even in a single user scenario, once replication
> is introduced (say, to a local instance of the app), it's easy to imagine a
> situation where things get messy. Then imagine read/unread metadata, and all
> the others types of user data associated with this otherwise static content.

Obviously you need to fit the solution to the problem.  Storing meta
data in a doc is not inherently a problem.  Storing *any* data that
has high contention on a single document is a problem, but that's just
basic CouchDB in the same way that you wouldn't store the comments on
a single blog post doc.  If your meta data is highly contended then
you need to find another way round it.  But all this is a non sequitur
from the problem of getting AND semantics for keys in view lookups.

--
Kerr

Re: playing with tags

Posted by Dean Landolt <de...@deanlandolt.com>.
On Thu, Feb 12, 2009 at 10:59 AM, Kerr Rainey <ke...@gmail.com> wrote:

> 2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> > This is an issue that has been nagging at me lately. Storing tags in the
> doc
> > seems like a recipe for disaster (that is, if you consider view
> contention
> > disaster). I would argue that tags (and other readily changeable
> > user-specific state information like read/unread, favorite, blah blah)
> > should be kept in separate docs and bridged together in views.
>
> Erm... why do think tags for a blog post would be rapidly changing?
> They would be set by the owner of a post and that's it.  The
> likelihood of a tag changing in this scenario is no greater than the
> post being edited.  It's meta info, but clearly part of the post.  You
> won't have many people try to update the same post, or at least the
> contention on that is very low.
>

Perhaps in this scenario, but take a feed reader, for instance. When a user
is tagging feeds, it is subject to change at a much different rate than the
otherwise static content. Even in a single user scenario, once replication
is introduced (say, to a local instance of the app), it's easy to imagine a
situation where things get messy. Then imagine read/unread metadata, and all
the others types of user data associated with this otherwise static content.

Re: playing with tags

Posted by Patrick Antivackis <pa...@gmail.com>.
> but now, I want all articles wich are tagged with "couchdb" *and* "python"
> (I want the article1). Is there a method to do it directly with CouchDB
views ?
> Something like that :

The only way to do that is to emit all the possible tags combinations for an
article.
In order to decrease the number of combinations, you first order them by
alphabetical order, then you emit them.

For the query, you will order alphabetically the tags you are looking for
and then use a starkey/endkey view retrieval.



2009/2/12 Kerr Rainey <ke...@gmail.com>

> 2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> > This is an issue that has been nagging at me lately. Storing tags in the
> doc
> > seems like a recipe for disaster (that is, if you consider view
> contention
> > disaster). I would argue that tags (and other readily changeable
> > user-specific state information like read/unread, favorite, blah blah)
> > should be kept in separate docs and bridged together in views.
>
> Erm... why do think tags for a blog post would be rapidly changing?
> They would be set by the owner of a post and that's it.  The
> likelihood of a tag changing in this scenario is no greater than the
> post being edited.  It's meta info, but clearly part of the post.  You
> won't have many people try to update the same post, or at least the
> contention on that is very low.
>
> --
> Kerr
>

Re: playing with tags

Posted by Kerr Rainey <ke...@gmail.com>.
2009/2/12 Dean Landolt <de...@deanlandolt.com>:
> This is an issue that has been nagging at me lately. Storing tags in the doc
> seems like a recipe for disaster (that is, if you consider view contention
> disaster). I would argue that tags (and other readily changeable
> user-specific state information like read/unread, favorite, blah blah)
> should be kept in separate docs and bridged together in views.

Erm... why do think tags for a blog post would be rapidly changing?
They would be set by the owner of a post and that's it.  The
likelihood of a tag changing in this scenario is no greater than the
post being edited.  It's meta info, but clearly part of the post.  You
won't have many people try to update the same post, or at least the
contention on that is very low.

--
Kerr

Re: playing with tags

Posted by Dean Landolt <de...@deanlandolt.com>.
On Thu, Feb 12, 2009 at 5:42 AM, Nicolas Clairon <cl...@gmail.com> wrote:

> Hi there !
>
> I'm playing with tags these time and a question comes to me.
> For exemple, I have a bunch of blog articles:
>
> article1 = {
>  ...snip...,
>  tags : ["couchdb", "python", "best practices"],
> }
>
> article2 = {
>    ...snip...,
>    tags : ["python", "best practices"],
> }
>
> article3 = {
>    ...snip...,
>    tags : ["couchdb", "best practices"],
> }


This is an issue that has been nagging at me lately. Storing tags in the doc
seems like a recipe for disaster (that is, if you consider view contention
disaster). I would argue that tags (and other readily changeable
user-specific state information like read/unread, favorite, blah blah)
should be kept in separate docs and bridged together in views. Of course,
doing this means you lose the ability to map through this data (e.g. no tag
clouds), so it's a lose/lose right now. This is something I just can't
figure out how to get around without a map/reduce/map kind of paradigm.

Paul Davis mentioned some ideas about value indexes and view intersections
that may help here. Does anyone else have opinions on how best to design for
this? Someone floated the article *Accountants Don't Use Erasers* [1]
recently that further convinced me this kind of data should be external to
the doc in question, but I'm at a loss for how to do this without
sacrificing functionality.

[1]
http://blogs.msdn.com/pathelland/archive/2007/06/14/accountants-don-t-use-erasers.aspx

Re: playing with tags

Posted by Jeremy Wall <jw...@google.com>.
Map functions can't see the requested keys. They emit the keys for an index
that couch builds.

Sent from my G1 google phone

On Feb 12, 2009 9:22 AM, "Michal Frackowiak" <mi...@wikidot.com> wrote:

I think you could create a separate view for selecting articles by multiple
tags.

1. Pass keys as a single string, e.g. ...?key="python,couchdb"
2. Explode the string in the view map function and return doc only if all
tags match.

Would it work?

Michal

On Feb 12, 2009, at 12:22 PM, Nicolas Clairon wrote: > Thanks for your quick
response. > > the l...
---------------
Michal Frackowiak
http://michalfrackowiak.com

Re: playing with tags

Posted by Michal Frackowiak <mi...@wikidot.com>.
I think you could create a separate view for selecting articles by  
multiple tags.

1. Pass keys as a single string, e.g. ...?key="python,couchdb"
2. Explode the string in the view map function and return doc only if  
all tags match.

Would it work?

Michal


On Feb 12, 2009, at 12:22 PM, Nicolas Clairon wrote:

> Thanks for your quick response.
>
> the line:
>
> curl -X POST http://localhost:5984/blog/_view/articles/by_tags -d
> '{"keys":["python", "couchdb"]}'
>
> gives me all the articles tagged with "python" *or* "couchdb"... but
> how can I handle this to have only
> the articles tagged with "python" *and* "couchdb" in one shot ?

---------------
Michal Frackowiak
http://michalfrackowiak.com





Re: playing with tags

Posted by Nicolas Clairon <cl...@gmail.com>.
Thanks a lot. I'll keep in touch.

On Thu, Feb 12, 2009 at 1:54 PM, Jeff Hinrichs - DM&T
<du...@gmail.com> wrote:
> On Thu, Feb 12, 2009 at 5:22 AM, Nicolas Clairon <cl...@gmail.com> wrote:
>> Thanks for your quick response.
>>
>> the line:
>>
>> curl -X POST http://localhost:5984/blog/_view/articles/by_tags -d
>> '{"keys":["python", "couchdb"]}'
>>
>> gives me all the articles tagged with "python" *or* "couchdb"... but
>> how can I handle this to have only
>> the articles tagged with "python" *and* "couchdb" in one shot ?
> You are correct, currently that feature only allows unions and not
> intersections, which is what want.  There is talk on the dev list,
> http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/browser
> about adding such functionality to couch, but currently your app must
> do the work.
>
> Regards,
>
> Jeff H
>>
>>
>> On Thu, Feb 12, 2009 at 11:53 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>> Hi,
>>>
>>> See http://wiki.apache.org/couchdb/HTTP_view_API
>>>
>>> and "POST" under "Query Options".
>>>
>>> You can POST a JSON structure with all your keys to
>>> a view and get all matching rows.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>> On 12 Feb 2009, at 11:42, Nicolas Clairon wrote:
>>>
>>>> Hi there !
>>>>
>>>> I'm playing with tags these time and a question comes to me.
>>>> For exemple, I have a bunch of blog articles:
>>>>
>>>> article1 = {
>>>>  ...snip...,
>>>>  tags : ["couchdb", "python", "best practices"],
>>>> }
>>>>
>>>> article2 = {
>>>>   ...snip...,
>>>>   tags : ["python", "best practices"],
>>>> }
>>>>
>>>> article3 = {
>>>>   ...snip...,
>>>>   tags : ["couchdb", "best practices"],
>>>> }
>>>>
>>>> and a view wich emit tags:
>>>>
>>>> function(doc){
>>>>   if (doc.type=="article"){
>>>>       for (var i in doc.tags){
>>>>           emit( doc.tags[i], doc )
>>>>       }
>>>>   }
>>>> }
>>>>
>>>> We can get all articles wich are tagged with "couchdb" easily:
>>>>
>>>> http://localhost:5984/blog/_view/article/by_tag?key="couchdb"
>>>>
>>>> but now, I want all articles wich are tagged with "couchdb" *and* "python"
>>>> (I want the article1). Is there a method to do it directly with CouchDB
>>>> views ?
>>>> Something like that :
>>>>
>>>> http://localhost:5984/blog/_view/article/by_tag?key_in=["couchdb",
>>>> "python"]
>>>>
>>>> For the moment, I have to do it by program,  firing 2 views and merge
>>>> the results...
>>>>
>>>> We can also think something like this:
>>>>
>>>> http://localhost:5984/blog/_view/article/by_tag?onekey_in=["couchdb",
>>>> "python"]
>>>>
>>>> wich will get all articles which are tagged with "couchdb" *or*
>>>> "python"...
>>>>
>>>> Does it already exists in CouchDB ?
>>>>
>>>
>

Re: playing with tags

Posted by Jeff Hinrichs - DM&T <du...@gmail.com>.
On Thu, Feb 12, 2009 at 5:22 AM, Nicolas Clairon <cl...@gmail.com> wrote:
> Thanks for your quick response.
>
> the line:
>
> curl -X POST http://localhost:5984/blog/_view/articles/by_tags -d
> '{"keys":["python", "couchdb"]}'
>
> gives me all the articles tagged with "python" *or* "couchdb"... but
> how can I handle this to have only
> the articles tagged with "python" *and* "couchdb" in one shot ?
You are correct, currently that feature only allows unions and not
intersections, which is what want.  There is talk on the dev list,
http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/browser
about adding such functionality to couch, but currently your app must
do the work.

Regards,

Jeff H
>
>
> On Thu, Feb 12, 2009 at 11:53 AM, Jan Lehnardt <ja...@apache.org> wrote:
>> Hi,
>>
>> See http://wiki.apache.org/couchdb/HTTP_view_API
>>
>> and "POST" under "Query Options".
>>
>> You can POST a JSON structure with all your keys to
>> a view and get all matching rows.
>>
>> Cheers
>> Jan
>> --
>>
>> On 12 Feb 2009, at 11:42, Nicolas Clairon wrote:
>>
>>> Hi there !
>>>
>>> I'm playing with tags these time and a question comes to me.
>>> For exemple, I have a bunch of blog articles:
>>>
>>> article1 = {
>>>  ...snip...,
>>>  tags : ["couchdb", "python", "best practices"],
>>> }
>>>
>>> article2 = {
>>>   ...snip...,
>>>   tags : ["python", "best practices"],
>>> }
>>>
>>> article3 = {
>>>   ...snip...,
>>>   tags : ["couchdb", "best practices"],
>>> }
>>>
>>> and a view wich emit tags:
>>>
>>> function(doc){
>>>   if (doc.type=="article"){
>>>       for (var i in doc.tags){
>>>           emit( doc.tags[i], doc )
>>>       }
>>>   }
>>> }
>>>
>>> We can get all articles wich are tagged with "couchdb" easily:
>>>
>>> http://localhost:5984/blog/_view/article/by_tag?key="couchdb"
>>>
>>> but now, I want all articles wich are tagged with "couchdb" *and* "python"
>>> (I want the article1). Is there a method to do it directly with CouchDB
>>> views ?
>>> Something like that :
>>>
>>> http://localhost:5984/blog/_view/article/by_tag?key_in=["couchdb",
>>> "python"]
>>>
>>> For the moment, I have to do it by program,  firing 2 views and merge
>>> the results...
>>>
>>> We can also think something like this:
>>>
>>> http://localhost:5984/blog/_view/article/by_tag?onekey_in=["couchdb",
>>> "python"]
>>>
>>> wich will get all articles which are tagged with "couchdb" *or*
>>> "python"...
>>>
>>> Does it already exists in CouchDB ?
>>>
>>

Re: playing with tags

Posted by Nicolas Clairon <cl...@gmail.com>.
Thanks for your quick response.

the line:

curl -X POST http://localhost:5984/blog/_view/articles/by_tags -d
'{"keys":["python", "couchdb"]}'

gives me all the articles tagged with "python" *or* "couchdb"... but
how can I handle this to have only
the articles tagged with "python" *and* "couchdb" in one shot ?


On Thu, Feb 12, 2009 at 11:53 AM, Jan Lehnardt <ja...@apache.org> wrote:
> Hi,
>
> See http://wiki.apache.org/couchdb/HTTP_view_API
>
> and "POST" under "Query Options".
>
> You can POST a JSON structure with all your keys to
> a view and get all matching rows.
>
> Cheers
> Jan
> --
>
> On 12 Feb 2009, at 11:42, Nicolas Clairon wrote:
>
>> Hi there !
>>
>> I'm playing with tags these time and a question comes to me.
>> For exemple, I have a bunch of blog articles:
>>
>> article1 = {
>>  ...snip...,
>>  tags : ["couchdb", "python", "best practices"],
>> }
>>
>> article2 = {
>>   ...snip...,
>>   tags : ["python", "best practices"],
>> }
>>
>> article3 = {
>>   ...snip...,
>>   tags : ["couchdb", "best practices"],
>> }
>>
>> and a view wich emit tags:
>>
>> function(doc){
>>   if (doc.type=="article"){
>>       for (var i in doc.tags){
>>           emit( doc.tags[i], doc )
>>       }
>>   }
>> }
>>
>> We can get all articles wich are tagged with "couchdb" easily:
>>
>> http://localhost:5984/blog/_view/article/by_tag?key="couchdb"
>>
>> but now, I want all articles wich are tagged with "couchdb" *and* "python"
>> (I want the article1). Is there a method to do it directly with CouchDB
>> views ?
>> Something like that :
>>
>> http://localhost:5984/blog/_view/article/by_tag?key_in=["couchdb",
>> "python"]
>>
>> For the moment, I have to do it by program,  firing 2 views and merge
>> the results...
>>
>> We can also think something like this:
>>
>> http://localhost:5984/blog/_view/article/by_tag?onekey_in=["couchdb",
>> "python"]
>>
>> wich will get all articles which are tagged with "couchdb" *or*
>> "python"...
>>
>> Does it already exists in CouchDB ?
>>
>
>

Re: playing with tags

Posted by Jan Lehnardt <ja...@apache.org>.
Hi,

See http://wiki.apache.org/couchdb/HTTP_view_API

and "POST" under "Query Options".

You can POST a JSON structure with all your keys to
a view and get all matching rows.

Cheers
Jan
--

On 12 Feb 2009, at 11:42, Nicolas Clairon wrote:

> Hi there !
>
> I'm playing with tags these time and a question comes to me.
> For exemple, I have a bunch of blog articles:
>
> article1 = {
>  ...snip...,
>  tags : ["couchdb", "python", "best practices"],
> }
>
> article2 = {
>    ...snip...,
>    tags : ["python", "best practices"],
> }
>
> article3 = {
>    ...snip...,
>    tags : ["couchdb", "best practices"],
> }
>
> and a view wich emit tags:
>
> function(doc){
>    if (doc.type=="article"){
>        for (var i in doc.tags){
>            emit( doc.tags[i], doc )
>        }
>    }
> }
>
> We can get all articles wich are tagged with "couchdb" easily:
>
> http://localhost:5984/blog/_view/article/by_tag?key="couchdb"
>
> but now, I want all articles wich are tagged with "couchdb" *and*  
> "python"
> (I want the article1). Is there a method to do it directly with  
> CouchDB views ?
> Something like that :
>
> http://localhost:5984/blog/_view/article/by_tag?key_in=["couchdb",  
> "python"]
>
> For the moment, I have to do it by program,  firing 2 views and merge
> the results...
>
> We can also think something like this:
>
> http://localhost:5984/blog/_view/article/by_tag? 
> onekey_in=["couchdb", "python"]
>
> wich will get all articles which are tagged with "couchdb" *or*  
> "python"...
>
> Does it already exists in CouchDB ?
>