You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Anthony Mills <am...@gascard.net> on 2008/04/26 06:34:36 UTC

Views

I read most of the documentation, wiki and blogs, but I still do not  
see how to accomplish a certain scenario.  Hopefully I can describe it  
adiquitely.

Lets say I have 1,000,000 documents [all of the same "type"] with a  
date attribute.  Lets say I want to pick a subset of those documents.   
How can I pick those documents of one type that fall on one day?  Will  
I need to get all 1,000,000 documents?  What if I want all documents  
of one type on one day that match another attribute?

I pretty sure this is what map/reduce will help with, but is there a  
way to do this now?  Can you use more documents to build date relations?

Also, can you pass more variables than just key to the view function?

Thank you,

Anthony

Re: Views

Posted by Guby <gu...@gmail.com>.
Hi Anthony

You can do it like this:

function(doc){
	if(doc.type == "THETYPE"){
		map(doc.date, doc);
	}
}

Where type is an attribute storing the document type and date is the  
date attribute.
Then you can use a key to get a document from a certain date, or you  
could use startkey endkey to get a range of documents

If you want them to match another attribute as well you could either  
rewrite it like this:

function(doc){
	if(doc.type == "THETYPE" && doc.other_attribute == "some_value"){
		map(doc.date, doc);
	}
}

Or using view collation you could write it like this:

function(doc){
	if(doc.type == "THETYPE" && doc.other_attribute == "some_value"){
		map([doc.date,true], doc);
	}
	if(doc.type == "THETYPE" && doc.other_attribute != "some_value"){
		map([doc.date,false], doc);
	}
	if(doc.type == "THETYPE"){
		map(doc.date, doc);
	}
}

then you can use keys like [date, true] to get the documents where the  
other attribute is true, and [date, false] to get the documents where  
the value is not true, or just date to get documents regardless of the  
state of the other attribute...

Hope that helps.

Best regards
Sebastian



On Apr 26, 2008, at 1:34 AM, Anthony Mills wrote:

> I read most of the documentation, wiki and blogs, but I still do not  
> see how to accomplish a certain scenario.  Hopefully I can describe  
> it adiquitely.
>
> Lets say I have 1,000,000 documents [all of the same "type"] with a  
> date attribute.  Lets say I want to pick a subset of those  
> documents.  How can I pick those documents of one type that fall on  
> one day?  Will I need to get all 1,000,000 documents?  What if I  
> want all documents of one type on one day that match another  
> attribute?
>
> I pretty sure this is what map/reduce will help with, but is there a  
> way to do this now?  Can you use more documents to build date  
> relations?
>
> Also, can you pass more variab


Re: Views

Posted by Jan Lehnardt <ja...@apache.org>.
On Apr 27, 2008, at 03:12, Anthony Mills wrote:
> Thank you everyone for answering my questions.
>
> Here is the way I understand it.  The first time a view is run it  
> creates a key-values list from all documents.  Future calls to the  
> view, update the key-value list with changed documents [added,  
> deleted, updated].
> If a startkey, endkey or key is used, only those keys that match in  
> the list are returned.

Where "match" mean either single entries from the view-index or  
consecutive ranges, but noting with gaps.


> If I use a different startkey, endkey or key, the key-value list is  
> not rebuilt, it uses the keys from the first view.
> Did I get it right?

Yes.


> Sorry about being obtuse. I have a project that can have over 10  
> million documents and I need to understand how they can be indexed.

As Chris mentioned, it might be best to play around with sample data  
to get a feel for views. Check out Futon, our built in administration  
client, it lets you define ad-hoc queries that you can modify at your  
will and later save permenently: http://localhost:5984/_utils/

Cheers
Jan
--


>
>
> Thank you,
> Anthony
>
> On Apr 26, 2008, at 2:11 PM, Jan Lehnardt wrote:
>
>> Heya Anthony,
>> On Apr 26, 2008, at 20:50, Anthony Mills wrote:
>>> Maybe I missing something.  When you create a view, does it create  
>>> indexes for attributes in the database?  When you add new  
>>> documents, do they automatically create the index for the  
>>> attributes for the view?
>>
>> A view index only has a single index which is what you send in as  
>> the first argument in the map() function. Nothing else is going on  
>> automatically.
>>
>>
>>> Also, can I call my view with soemthing like ? 
>>> startkey=['20080403t000000', 1234]&endkey=['20080405t235959',  
>>> 1234] to
>>>
>>> function(doc){
>>> 	if(doc.type == "hello"){
>>> 		map([doc.date, doc.number], doc);
>>> 	}
>>> }
>>>
>>> Then, through the magic of couchdb, I'll only get back those  
>>> documents between the April 3rd and 5th whose attribute number=1234?
>>
>> Nope, you'd need a [doc.number, doc.date] index for that. It is  
>> rather straightforward than magical. The map() function just  
>> creates a key-value list that is sorted by key and you can query  
>> only ranges within the key-space.
>>
>>
>>> Will couchdb only search through records that match the key? or  
>>> will it need to go through all documents every time I call the view?
>>
>> To build the view index CouchDB will go through all documents. But  
>> only once. For documents that change, get deleted or added, CouchDB  
>> incrementally updates the index. Also, view indexes are build when  
>> you query the view, not when you add documents.
>>
>>
>>> To get nerdy, I want my views to find records in O(log n) not O(n).
>>
>> You get your results in O(1) ;-) (after the first query to each  
>> view).
>>
>> In relational terms, think of a view as an index on a column  
>> without the write penalty. So have as much as you might need.
>>
>> I hope that helps, feel free to send more questions :)
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>>
>>>
>>> Thanks,
>>>
>>> Anthony
>>>
>>> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>>>
>>>> Anthony,
>>>>
>>>> http://wiki.apache.org/couchdb/ViewCollation is the way to  
>>>> accomplish
>>>> tasks like that.
>>>>
>>>> Christopher Lenz has a write-up of how to use view collation to  
>>>> sort
>>>> views, achieving comments grouped by parent blog post.
>>>>
>>>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>>>
>>>> In your case you could index a view with date and type, like this
>>>>
>>>> [type, date]
>>>>
>>>> and then if you had say 5 types you'd do 5 GET queries against the
>>>> database, each one fetching only the documents for that day.
>>>>
>>>> View collation is one of my favorite things about CouchDB. I'm  
>>>> excited
>>>> about reduce, because from what I understand, you could use it to
>>>> lower this to 1 GET, if that's important to you.
>>>>
>>>> enjoy,
>>>> Chris
>>>>
>>>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net 
>>>> > wrote:
>>>>> I read most of the documentation, wiki and blogs, but I still do  
>>>>> not see how
>>>>> to accomplish a certain scenario.  Hopefully I can describe it  
>>>>> adiquitely.
>>>>>
>>>>> Lets say I have 1,000,000 documents [all of the same "type"]  
>>>>> with a date
>>>>> attribute.  Lets say I want to pick a subset of those  
>>>>> documents.  How can I
>>>>> pick those documents of one type that fall on one day?  Will I  
>>>>> need to get
>>>>> all 1,000,000 documents?  What if I want all documents of one  
>>>>> type on one
>>>>> day that match another attribute?
>>>>>
>>>>> I pretty sure this is what map/reduce will help with, but is  
>>>>> there a way to
>>>>> do this now?  Can you use more documents to build date relations?
>>>>>
>>>>> Also, can you pass more variables than just key to
>>>
>>>
>>
>
>


Re: Views

Posted by Chris Anderson <jc...@mfdz.com>.
On Sat, Apr 26, 2008 at 6:12 PM, Anthony Mills <am...@gascard.net> wrote:

>  If I use a different startkey, endkey or key, the key-value list is not
> rebuilt, it uses the keys from the first view.
>  Did I get it right?

You got it.

>  Sorry about being obtuse. I have a project that can have over 10 million
> documents and I need to understand how they can be indexed.

In my experience, the best way to get a feel for what you can do is by
playing around with data. I have a little Ruby script to batch data
over from Postgres so I can try view functions etc. -- It's more fun
if you work on a lot of data. A cool thing to remember is that you can
call the emit function more than once in a view, with different keys
and values, so views are extraordinarily flexible. I'm still getting
by bearings on the "best" way to store certain kinds of data - I have
a hunch that different schema/view styles may turn out to be more
efficient than others.

Chris


-- 
Chris Anderson
http://jchris.mfdz.com

Re: Views

Posted by Anthony Mills <am...@gascard.net>.
Thank you everyone for answering my questions.

Here is the way I understand it.  The first time a view is run it  
creates a key-values list from all documents.  Future calls to the  
view, update the key-value list with changed documents [added,  
deleted, updated].
If a startkey, endkey or key is used, only those keys that match in  
the list are returned.
If I use a different startkey, endkey or key, the key-value list is  
not rebuilt, it uses the keys from the first view.
Did I get it right?

Sorry about being obtuse. I have a project that can have over 10  
million documents and I need to understand how they can be indexed.

Thank you,
Anthony

On Apr 26, 2008, at 2:11 PM, Jan Lehnardt wrote:

> Heya Anthony,
> On Apr 26, 2008, at 20:50, Anthony Mills wrote:
>> Maybe I missing something.  When you create a view, does it create  
>> indexes for attributes in the database?  When you add new  
>> documents, do they automatically create the index for the  
>> attributes for the view?
>
> A view index only has a single index which is what you send in as  
> the first argument in the map() function. Nothing else is going on  
> automatically.
>
>
>> Also, can I call my view with soemthing like ? 
>> startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234]  
>> to
>>
>> function(doc){
>> 	if(doc.type == "hello"){
>> 		map([doc.date, doc.number], doc);
>> 	}
>> }
>>
>> Then, through the magic of couchdb, I'll only get back those  
>> documents between the April 3rd and 5th whose attribute number=1234?
>
> Nope, you'd need a [doc.number, doc.date] index for that. It is  
> rather straightforward than magical. The map() function just creates  
> a key-value list that is sorted by key and you can query only ranges  
> within the key-space.
>
>
>> Will couchdb only search through records that match the key? or  
>> will it need to go through all documents every time I call the view?
>
> To build the view index CouchDB will go through all documents. But  
> only once. For documents that change, get deleted or added, CouchDB  
> incrementally updates the index. Also, view indexes are build when  
> you query the view, not when you add documents.
>
>
>> To get nerdy, I want my views to find records in O(log n) not O(n).
>
> You get your results in O(1) ;-) (after the first query to each view).
>
> In relational terms, think of a view as an index on a column without  
> the write penalty. So have as much as you might need.
>
> I hope that helps, feel free to send more questions :)
>
> Cheers
> Jan
> --
>
>
>
>>
>>
>> Thanks,
>>
>> Anthony
>>
>> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>>
>>> Anthony,
>>>
>>> http://wiki.apache.org/couchdb/ViewCollation is the way to  
>>> accomplish
>>> tasks like that.
>>>
>>> Christopher Lenz has a write-up of how to use view collation to sort
>>> views, achieving comments grouped by parent blog post.
>>>
>>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>>
>>> In your case you could index a view with date and type, like this
>>>
>>> [type, date]
>>>
>>> and then if you had say 5 types you'd do 5 GET queries against the
>>> database, each one fetching only the documents for that day.
>>>
>>> View collation is one of my favorite things about CouchDB. I'm  
>>> excited
>>> about reduce, because from what I understand, you could use it to
>>> lower this to 1 GET, if that's important to you.
>>>
>>> enjoy,
>>> Chris
>>>
>>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net 
>>> > wrote:
>>>> I read most of the documentation, wiki and blogs, but I still do  
>>>> not see how
>>>> to accomplish a certain scenario.  Hopefully I can describe it  
>>>> adiquitely.
>>>>
>>>> Lets say I have 1,000,000 documents [all of the same "type"] with  
>>>> a date
>>>> attribute.  Lets say I want to pick a subset of those documents.   
>>>> How can I
>>>> pick those documents of one type that fall on one day?  Will I  
>>>> need to get
>>>> all 1,000,000 documents?  What if I want all documents of one  
>>>> type on one
>>>> day that match another attribute?
>>>>
>>>> I pretty sure this is what map/reduce will help with, but is  
>>>> there a way to
>>>> do this now?  Can you use more documents to build date relations?
>>>>
>>>> Also, can you pass more variables than just key to
>>
>>
>


Re: Views

Posted by Jan Lehnardt <ja...@apache.org>.
Heya Anthony,
On Apr 26, 2008, at 20:50, Anthony Mills wrote:
> Maybe I missing something.  When you create a view, does it create  
> indexes for attributes in the database?  When you add new documents,  
> do they automatically create the index for the attributes for the  
> view?

A view index only has a single index which is what you send in as the  
first argument in the map() function. Nothing else is going on  
automatically.


> Also, can I call my view with soemthing like ? 
> startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234] to
>
> function(doc){
> 	if(doc.type == "hello"){
> 		map([doc.date, doc.number], doc);
> 	}
> }
>
> Then, through the magic of couchdb, I'll only get back those  
> documents between the April 3rd and 5th whose attribute number=1234?

Nope, you'd need a [doc.number, doc.date] index for that. It is rather  
straightforward than magical. The map() function just creates a key- 
value list that is sorted by key and you can query only ranges within  
the key-space.


> Will couchdb only search through records that match the key? or will  
> it need to go through all documents every time I call the view?

To build the view index CouchDB will go through all documents. But  
only once. For documents that change, get deleted or added, CouchDB  
incrementally updates the index. Also, view indexes are build when you  
query the view, not when you add documents.


> To get nerdy, I want my views to find records in O(log n) not O(n).

You get your results in O(1) ;-) (after the first query to each view).

In relational terms, think of a view as an index on a column without  
the write penalty. So have as much as you might need.

I hope that helps, feel free to send more questions :)

Cheers
Jan
--



>
>
> Thanks,
>
> Anthony
>
> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>
>> Anthony,
>>
>> http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
>> tasks like that.
>>
>> Christopher Lenz has a write-up of how to use view collation to sort
>> views, achieving comments grouped by parent blog post.
>>
>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>
>> In your case you could index a view with date and type, like this
>>
>> [type, date]
>>
>> and then if you had say 5 types you'd do 5 GET queries against the
>> database, each one fetching only the documents for that day.
>>
>> View collation is one of my favorite things about CouchDB. I'm  
>> excited
>> about reduce, because from what I understand, you could use it to
>> lower this to 1 GET, if that's important to you.
>>
>> enjoy,
>> Chris
>>
>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net 
>> > wrote:
>>> I read most of the documentation, wiki and blogs, but I still do  
>>> not see how
>>> to accomplish a certain scenario.  Hopefully I can describe it  
>>> adiquitely.
>>>
>>> Lets say I have 1,000,000 documents [all of the same "type"] with  
>>> a date
>>> attribute.  Lets say I want to pick a subset of those documents.   
>>> How can I
>>> pick those documents of one type that fall on one day?  Will I  
>>> need to get
>>> all 1,000,000 documents?  What if I want all documents of one type  
>>> on one
>>> day that match another attribute?
>>>
>>> I pretty sure this is what map/reduce will help with, but is there  
>>> a way to
>>> do this now?  Can you use more documents to build date relations?
>>>
>>> Also, can you pass more variables than just key to
>
>


Re: Views

Posted by Chris Anderson <jc...@mfdz.com>.
On Sat, Apr 26, 2008 at 11:50 AM, Anthony Mills <am...@gascard.net> wrote:
>
>  Then, through the magic of couchdb, I'll only get back those documents between the April 3rd and 5th whose attribute number=1234?
>
>  Will couchdb only search through records that match the key? or will it
> need to go through all documents every time I call the view?
>
>  To get nerdy, I want my views to find records in O(log n) not O(n).
>

My understanding is that because CouchDB uses materialized views,
pulling sequential records from a view key is very efficient. The
database tends to put the weight of creating joins on the client. In
your case you'd select all the view rows in that date range using a
single query, filter them by document number in your application code,
and optionally run GETs to retrieve the indexed documents.

You could avoid the filtering step on the client by changing your view
function so the number sorts first:

function(doc){
       if(doc.type == "hello"){
               map([doc.number, doc.date], doc);
       }
}


As long as you only want to select documents with a single doc.number
at a time, this view returns just the data you need, with a query
like:

?startkey=[1234,'20080403t000000']&endkey=[1234,'20080405t235959']

on the downside, this version of the view becomes really nasty as soon
as you want to fetch all documents from a given time range, across all
values of doc.number. So unless you're going to be querying by
doc.number a lot (or you have a lot of doc numbers, and don't mind
maintaining both views), I'd recommend your view, plus filtering in
the application.


-- 
Chris Anderson
http://jchris.mfdz.com

Re: Views

Posted by Anthony Mills <am...@gascard.net>.
I read both those links.  I understand what they are trying to do, but  
I'm not really trying to collate two document types.

Maybe I missing something.  When you create a view, does it create  
indexes for attributes in the database?  When you add new documents,  
do they automatically create the index for the attributes for the view?

Also, can I call my view with soemthing like ? 
startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234] to

function(doc){
	if(doc.type == "hello"){
		map([doc.date, doc.number], doc);
	}
}

Then, through the magic of couchdb, I'll only get back those documents  
between the April 3rd and 5th whose attribute number=1234?

Will couchdb only search through records that match the key? or will  
it need to go through all documents every time I call the view?

To get nerdy, I want my views to find records in O(log n) not O(n).

Thanks,

Anthony

On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:

> Anthony,
>
> http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
> tasks like that.
>
> Christopher Lenz has a write-up of how to use view collation to sort
> views, achieving comments grouped by parent blog post.
>
> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>
> In your case you could index a view with date and type, like this
>
> [type, date]
>
> and then if you had say 5 types you'd do 5 GET queries against the
> database, each one fetching only the documents for that day.
>
> View collation is one of my favorite things about CouchDB. I'm excited
> about reduce, because from what I understand, you could use it to
> lower this to 1 GET, if that's important to you.
>
> enjoy,
> Chris
>
> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills  
> <am...@gascard.net> wrote:
>> I read most of the documentation, wiki and blogs, but I still do  
>> not see how
>> to accomplish a certain scenario.  Hopefully I can describe it  
>> adiquitely.
>>
>> Lets say I have 1,000,000 documents [all of the same "type"] with a  
>> date
>> attribute.  Lets say I want to pick a subset of those documents.   
>> How can I
>> pick those documents of one type that fall on one day?  Will I need  
>> to get
>> all 1,000,000 documents?  What if I want all documents of one type  
>> on one
>> day that match another attribute?
>>
>> I pretty sure this is what map/reduce will help with, but is there  
>> a way to
>> do this now?  Can you use more documents to build date relations?
>>
>> Also, can you pass more variables than just key to


Re: Views

Posted by Chris Anderson <jc...@mfdz.com>.
Anthony,

http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
tasks like that.

Christopher Lenz has a write-up of how to use view collation to sort
views, achieving comments grouped by parent blog post.

 http://www.cmlenz.net/archives/2007/10/couchdb-joins

In your case you could index a view with date and type, like this

[type, date]

and then if you had say 5 types you'd do 5 GET queries against the
database, each one fetching only the documents for that day.

View collation is one of my favorite things about CouchDB. I'm excited
about reduce, because from what I understand, you could use it to
lower this to 1 GET, if that's important to you.

enjoy,
Chris

On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <am...@gascard.net> wrote:
> I read most of the documentation, wiki and blogs, but I still do not see how
> to accomplish a certain scenario.  Hopefully I can describe it adiquitely.
>
>  Lets say I have 1,000,000 documents [all of the same "type"] with a date
> attribute.  Lets say I want to pick a subset of those documents.  How can I
> pick those documents of one type that fall on one day?  Will I need to get
> all 1,000,000 documents?  What if I want all documents of one type on one
> day that match another attribute?
>
>  I pretty sure this is what map/reduce will help with, but is there a way to
> do this now?  Can you use more documents to build date relations?
>
>  Also, can you pass more variables than just key to the view function?
>
>  Thank you,
>
>  Anthony
>



-- 
Chris Anderson
http://jchris.mfdz.com