You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Anthony Mills <am...@gascard.net> on 2008/04/26 06:34:36 UTC
Views
I read most of the documentation, wiki and blogs, but I still do not
see how to accomplish a certain scenario. Hopefully I can describe it
adiquitely.
Lets say I have 1,000,000 documents [all of the same "type"] with a
date attribute. Lets say I want to pick a subset of those documents.
How can I pick those documents of one type that fall on one day? Will
I need to get all 1,000,000 documents? What if I want all documents
of one type on one day that match another attribute?
I pretty sure this is what map/reduce will help with, but is there a
way to do this now? Can you use more documents to build date relations?
Also, can you pass more variables than just key to the view function?
Thank you,
Anthony
Re: Views
Posted by Guby <gu...@gmail.com>.
Hi Anthony
You can do it like this:
function(doc){
if(doc.type == "THETYPE"){
map(doc.date, doc);
}
}
Where type is an attribute storing the document type and date is the
date attribute.
Then you can use a key to get a document from a certain date, or you
could use startkey endkey to get a range of documents
If you want them to match another attribute as well you could either
rewrite it like this:
function(doc){
if(doc.type == "THETYPE" && doc.other_attribute == "some_value"){
map(doc.date, doc);
}
}
Or using view collation you could write it like this:
function(doc){
if(doc.type == "THETYPE" && doc.other_attribute == "some_value"){
map([doc.date,true], doc);
}
if(doc.type == "THETYPE" && doc.other_attribute != "some_value"){
map([doc.date,false], doc);
}
if(doc.type == "THETYPE"){
map(doc.date, doc);
}
}
then you can use keys like [date, true] to get the documents where the
other attribute is true, and [date, false] to get the documents where
the value is not true, or just date to get documents regardless of the
state of the other attribute...
Hope that helps.
Best regards
Sebastian
On Apr 26, 2008, at 1:34 AM, Anthony Mills wrote:
> I read most of the documentation, wiki and blogs, but I still do not
> see how to accomplish a certain scenario. Hopefully I can describe
> it adiquitely.
>
> Lets say I have 1,000,000 documents [all of the same "type"] with a
> date attribute. Lets say I want to pick a subset of those
> documents. How can I pick those documents of one type that fall on
> one day? Will I need to get all 1,000,000 documents? What if I
> want all documents of one type on one day that match another
> attribute?
>
> I pretty sure this is what map/reduce will help with, but is there a
> way to do this now? Can you use more documents to build date
> relations?
>
> Also, can you pass more variab
Re: Views
Posted by Jan Lehnardt <ja...@apache.org>.
On Apr 27, 2008, at 03:12, Anthony Mills wrote:
> Thank you everyone for answering my questions.
>
> Here is the way I understand it. The first time a view is run it
> creates a key-values list from all documents. Future calls to the
> view, update the key-value list with changed documents [added,
> deleted, updated].
> If a startkey, endkey or key is used, only those keys that match in
> the list are returned.
Where "match" mean either single entries from the view-index or
consecutive ranges, but noting with gaps.
> If I use a different startkey, endkey or key, the key-value list is
> not rebuilt, it uses the keys from the first view.
> Did I get it right?
Yes.
> Sorry about being obtuse. I have a project that can have over 10
> million documents and I need to understand how they can be indexed.
As Chris mentioned, it might be best to play around with sample data
to get a feel for views. Check out Futon, our built in administration
client, it lets you define ad-hoc queries that you can modify at your
will and later save permenently: http://localhost:5984/_utils/
Cheers
Jan
--
>
>
> Thank you,
> Anthony
>
> On Apr 26, 2008, at 2:11 PM, Jan Lehnardt wrote:
>
>> Heya Anthony,
>> On Apr 26, 2008, at 20:50, Anthony Mills wrote:
>>> Maybe I missing something. When you create a view, does it create
>>> indexes for attributes in the database? When you add new
>>> documents, do they automatically create the index for the
>>> attributes for the view?
>>
>> A view index only has a single index which is what you send in as
>> the first argument in the map() function. Nothing else is going on
>> automatically.
>>
>>
>>> Also, can I call my view with soemthing like ?
>>> startkey=['20080403t000000', 1234]&endkey=['20080405t235959',
>>> 1234] to
>>>
>>> function(doc){
>>> if(doc.type == "hello"){
>>> map([doc.date, doc.number], doc);
>>> }
>>> }
>>>
>>> Then, through the magic of couchdb, I'll only get back those
>>> documents between the April 3rd and 5th whose attribute number=1234?
>>
>> Nope, you'd need a [doc.number, doc.date] index for that. It is
>> rather straightforward than magical. The map() function just
>> creates a key-value list that is sorted by key and you can query
>> only ranges within the key-space.
>>
>>
>>> Will couchdb only search through records that match the key? or
>>> will it need to go through all documents every time I call the view?
>>
>> To build the view index CouchDB will go through all documents. But
>> only once. For documents that change, get deleted or added, CouchDB
>> incrementally updates the index. Also, view indexes are build when
>> you query the view, not when you add documents.
>>
>>
>>> To get nerdy, I want my views to find records in O(log n) not O(n).
>>
>> You get your results in O(1) ;-) (after the first query to each
>> view).
>>
>> In relational terms, think of a view as an index on a column
>> without the write penalty. So have as much as you might need.
>>
>> I hope that helps, feel free to send more questions :)
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>>
>>>
>>> Thanks,
>>>
>>> Anthony
>>>
>>> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>>>
>>>> Anthony,
>>>>
>>>> http://wiki.apache.org/couchdb/ViewCollation is the way to
>>>> accomplish
>>>> tasks like that.
>>>>
>>>> Christopher Lenz has a write-up of how to use view collation to
>>>> sort
>>>> views, achieving comments grouped by parent blog post.
>>>>
>>>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>>>
>>>> In your case you could index a view with date and type, like this
>>>>
>>>> [type, date]
>>>>
>>>> and then if you had say 5 types you'd do 5 GET queries against the
>>>> database, each one fetching only the documents for that day.
>>>>
>>>> View collation is one of my favorite things about CouchDB. I'm
>>>> excited
>>>> about reduce, because from what I understand, you could use it to
>>>> lower this to 1 GET, if that's important to you.
>>>>
>>>> enjoy,
>>>> Chris
>>>>
>>>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net
>>>> > wrote:
>>>>> I read most of the documentation, wiki and blogs, but I still do
>>>>> not see how
>>>>> to accomplish a certain scenario. Hopefully I can describe it
>>>>> adiquitely.
>>>>>
>>>>> Lets say I have 1,000,000 documents [all of the same "type"]
>>>>> with a date
>>>>> attribute. Lets say I want to pick a subset of those
>>>>> documents. How can I
>>>>> pick those documents of one type that fall on one day? Will I
>>>>> need to get
>>>>> all 1,000,000 documents? What if I want all documents of one
>>>>> type on one
>>>>> day that match another attribute?
>>>>>
>>>>> I pretty sure this is what map/reduce will help with, but is
>>>>> there a way to
>>>>> do this now? Can you use more documents to build date relations?
>>>>>
>>>>> Also, can you pass more variables than just key to
>>>
>>>
>>
>
>
Re: Views
Posted by Chris Anderson <jc...@mfdz.com>.
On Sat, Apr 26, 2008 at 6:12 PM, Anthony Mills <am...@gascard.net> wrote:
> If I use a different startkey, endkey or key, the key-value list is not
> rebuilt, it uses the keys from the first view.
> Did I get it right?
You got it.
> Sorry about being obtuse. I have a project that can have over 10 million
> documents and I need to understand how they can be indexed.
In my experience, the best way to get a feel for what you can do is by
playing around with data. I have a little Ruby script to batch data
over from Postgres so I can try view functions etc. -- It's more fun
if you work on a lot of data. A cool thing to remember is that you can
call the emit function more than once in a view, with different keys
and values, so views are extraordinarily flexible. I'm still getting
by bearings on the "best" way to store certain kinds of data - I have
a hunch that different schema/view styles may turn out to be more
efficient than others.
Chris
--
Chris Anderson
http://jchris.mfdz.com
Re: Views
Posted by Anthony Mills <am...@gascard.net>.
Thank you everyone for answering my questions.
Here is the way I understand it. The first time a view is run it
creates a key-values list from all documents. Future calls to the
view, update the key-value list with changed documents [added,
deleted, updated].
If a startkey, endkey or key is used, only those keys that match in
the list are returned.
If I use a different startkey, endkey or key, the key-value list is
not rebuilt, it uses the keys from the first view.
Did I get it right?
Sorry about being obtuse. I have a project that can have over 10
million documents and I need to understand how they can be indexed.
Thank you,
Anthony
On Apr 26, 2008, at 2:11 PM, Jan Lehnardt wrote:
> Heya Anthony,
> On Apr 26, 2008, at 20:50, Anthony Mills wrote:
>> Maybe I missing something. When you create a view, does it create
>> indexes for attributes in the database? When you add new
>> documents, do they automatically create the index for the
>> attributes for the view?
>
> A view index only has a single index which is what you send in as
> the first argument in the map() function. Nothing else is going on
> automatically.
>
>
>> Also, can I call my view with soemthing like ?
>> startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234]
>> to
>>
>> function(doc){
>> if(doc.type == "hello"){
>> map([doc.date, doc.number], doc);
>> }
>> }
>>
>> Then, through the magic of couchdb, I'll only get back those
>> documents between the April 3rd and 5th whose attribute number=1234?
>
> Nope, you'd need a [doc.number, doc.date] index for that. It is
> rather straightforward than magical. The map() function just creates
> a key-value list that is sorted by key and you can query only ranges
> within the key-space.
>
>
>> Will couchdb only search through records that match the key? or
>> will it need to go through all documents every time I call the view?
>
> To build the view index CouchDB will go through all documents. But
> only once. For documents that change, get deleted or added, CouchDB
> incrementally updates the index. Also, view indexes are build when
> you query the view, not when you add documents.
>
>
>> To get nerdy, I want my views to find records in O(log n) not O(n).
>
> You get your results in O(1) ;-) (after the first query to each view).
>
> In relational terms, think of a view as an index on a column without
> the write penalty. So have as much as you might need.
>
> I hope that helps, feel free to send more questions :)
>
> Cheers
> Jan
> --
>
>
>
>>
>>
>> Thanks,
>>
>> Anthony
>>
>> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>>
>>> Anthony,
>>>
>>> http://wiki.apache.org/couchdb/ViewCollation is the way to
>>> accomplish
>>> tasks like that.
>>>
>>> Christopher Lenz has a write-up of how to use view collation to sort
>>> views, achieving comments grouped by parent blog post.
>>>
>>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>>
>>> In your case you could index a view with date and type, like this
>>>
>>> [type, date]
>>>
>>> and then if you had say 5 types you'd do 5 GET queries against the
>>> database, each one fetching only the documents for that day.
>>>
>>> View collation is one of my favorite things about CouchDB. I'm
>>> excited
>>> about reduce, because from what I understand, you could use it to
>>> lower this to 1 GET, if that's important to you.
>>>
>>> enjoy,
>>> Chris
>>>
>>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net
>>> > wrote:
>>>> I read most of the documentation, wiki and blogs, but I still do
>>>> not see how
>>>> to accomplish a certain scenario. Hopefully I can describe it
>>>> adiquitely.
>>>>
>>>> Lets say I have 1,000,000 documents [all of the same "type"] with
>>>> a date
>>>> attribute. Lets say I want to pick a subset of those documents.
>>>> How can I
>>>> pick those documents of one type that fall on one day? Will I
>>>> need to get
>>>> all 1,000,000 documents? What if I want all documents of one
>>>> type on one
>>>> day that match another attribute?
>>>>
>>>> I pretty sure this is what map/reduce will help with, but is
>>>> there a way to
>>>> do this now? Can you use more documents to build date relations?
>>>>
>>>> Also, can you pass more variables than just key to
>>
>>
>
Re: Views
Posted by Jan Lehnardt <ja...@apache.org>.
Heya Anthony,
On Apr 26, 2008, at 20:50, Anthony Mills wrote:
> Maybe I missing something. When you create a view, does it create
> indexes for attributes in the database? When you add new documents,
> do they automatically create the index for the attributes for the
> view?
A view index only has a single index which is what you send in as the
first argument in the map() function. Nothing else is going on
automatically.
> Also, can I call my view with soemthing like ?
> startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234] to
>
> function(doc){
> if(doc.type == "hello"){
> map([doc.date, doc.number], doc);
> }
> }
>
> Then, through the magic of couchdb, I'll only get back those
> documents between the April 3rd and 5th whose attribute number=1234?
Nope, you'd need a [doc.number, doc.date] index for that. It is rather
straightforward than magical. The map() function just creates a key-
value list that is sorted by key and you can query only ranges within
the key-space.
> Will couchdb only search through records that match the key? or will
> it need to go through all documents every time I call the view?
To build the view index CouchDB will go through all documents. But
only once. For documents that change, get deleted or added, CouchDB
incrementally updates the index. Also, view indexes are build when you
query the view, not when you add documents.
> To get nerdy, I want my views to find records in O(log n) not O(n).
You get your results in O(1) ;-) (after the first query to each view).
In relational terms, think of a view as an index on a column without
the write penalty. So have as much as you might need.
I hope that helps, feel free to send more questions :)
Cheers
Jan
--
>
>
> Thanks,
>
> Anthony
>
> On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
>
>> Anthony,
>>
>> http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
>> tasks like that.
>>
>> Christopher Lenz has a write-up of how to use view collation to sort
>> views, achieving comments grouped by parent blog post.
>>
>> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>>
>> In your case you could index a view with date and type, like this
>>
>> [type, date]
>>
>> and then if you had say 5 types you'd do 5 GET queries against the
>> database, each one fetching only the documents for that day.
>>
>> View collation is one of my favorite things about CouchDB. I'm
>> excited
>> about reduce, because from what I understand, you could use it to
>> lower this to 1 GET, if that's important to you.
>>
>> enjoy,
>> Chris
>>
>> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <amills1037@gascard.net
>> > wrote:
>>> I read most of the documentation, wiki and blogs, but I still do
>>> not see how
>>> to accomplish a certain scenario. Hopefully I can describe it
>>> adiquitely.
>>>
>>> Lets say I have 1,000,000 documents [all of the same "type"] with
>>> a date
>>> attribute. Lets say I want to pick a subset of those documents.
>>> How can I
>>> pick those documents of one type that fall on one day? Will I
>>> need to get
>>> all 1,000,000 documents? What if I want all documents of one type
>>> on one
>>> day that match another attribute?
>>>
>>> I pretty sure this is what map/reduce will help with, but is there
>>> a way to
>>> do this now? Can you use more documents to build date relations?
>>>
>>> Also, can you pass more variables than just key to
>
>
Re: Views
Posted by Chris Anderson <jc...@mfdz.com>.
On Sat, Apr 26, 2008 at 11:50 AM, Anthony Mills <am...@gascard.net> wrote:
>
> Then, through the magic of couchdb, I'll only get back those documents between the April 3rd and 5th whose attribute number=1234?
>
> Will couchdb only search through records that match the key? or will it
> need to go through all documents every time I call the view?
>
> To get nerdy, I want my views to find records in O(log n) not O(n).
>
My understanding is that because CouchDB uses materialized views,
pulling sequential records from a view key is very efficient. The
database tends to put the weight of creating joins on the client. In
your case you'd select all the view rows in that date range using a
single query, filter them by document number in your application code,
and optionally run GETs to retrieve the indexed documents.
You could avoid the filtering step on the client by changing your view
function so the number sorts first:
function(doc){
if(doc.type == "hello"){
map([doc.number, doc.date], doc);
}
}
As long as you only want to select documents with a single doc.number
at a time, this view returns just the data you need, with a query
like:
?startkey=[1234,'20080403t000000']&endkey=[1234,'20080405t235959']
on the downside, this version of the view becomes really nasty as soon
as you want to fetch all documents from a given time range, across all
values of doc.number. So unless you're going to be querying by
doc.number a lot (or you have a lot of doc numbers, and don't mind
maintaining both views), I'd recommend your view, plus filtering in
the application.
--
Chris Anderson
http://jchris.mfdz.com
Re: Views
Posted by Anthony Mills <am...@gascard.net>.
I read both those links. I understand what they are trying to do, but
I'm not really trying to collate two document types.
Maybe I missing something. When you create a view, does it create
indexes for attributes in the database? When you add new documents,
do they automatically create the index for the attributes for the view?
Also, can I call my view with soemthing like ?
startkey=['20080403t000000', 1234]&endkey=['20080405t235959', 1234] to
function(doc){
if(doc.type == "hello"){
map([doc.date, doc.number], doc);
}
}
Then, through the magic of couchdb, I'll only get back those documents
between the April 3rd and 5th whose attribute number=1234?
Will couchdb only search through records that match the key? or will
it need to go through all documents every time I call the view?
To get nerdy, I want my views to find records in O(log n) not O(n).
Thanks,
Anthony
On Apr 26, 2008, at 1:02 AM, Chris Anderson wrote:
> Anthony,
>
> http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
> tasks like that.
>
> Christopher Lenz has a write-up of how to use view collation to sort
> views, achieving comments grouped by parent blog post.
>
> http://www.cmlenz.net/archives/2007/10/couchdb-joins
>
> In your case you could index a view with date and type, like this
>
> [type, date]
>
> and then if you had say 5 types you'd do 5 GET queries against the
> database, each one fetching only the documents for that day.
>
> View collation is one of my favorite things about CouchDB. I'm excited
> about reduce, because from what I understand, you could use it to
> lower this to 1 GET, if that's important to you.
>
> enjoy,
> Chris
>
> On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills
> <am...@gascard.net> wrote:
>> I read most of the documentation, wiki and blogs, but I still do
>> not see how
>> to accomplish a certain scenario. Hopefully I can describe it
>> adiquitely.
>>
>> Lets say I have 1,000,000 documents [all of the same "type"] with a
>> date
>> attribute. Lets say I want to pick a subset of those documents.
>> How can I
>> pick those documents of one type that fall on one day? Will I need
>> to get
>> all 1,000,000 documents? What if I want all documents of one type
>> on one
>> day that match another attribute?
>>
>> I pretty sure this is what map/reduce will help with, but is there
>> a way to
>> do this now? Can you use more documents to build date relations?
>>
>> Also, can you pass more variables than just key to
Re: Views
Posted by Chris Anderson <jc...@mfdz.com>.
Anthony,
http://wiki.apache.org/couchdb/ViewCollation is the way to accomplish
tasks like that.
Christopher Lenz has a write-up of how to use view collation to sort
views, achieving comments grouped by parent blog post.
http://www.cmlenz.net/archives/2007/10/couchdb-joins
In your case you could index a view with date and type, like this
[type, date]
and then if you had say 5 types you'd do 5 GET queries against the
database, each one fetching only the documents for that day.
View collation is one of my favorite things about CouchDB. I'm excited
about reduce, because from what I understand, you could use it to
lower this to 1 GET, if that's important to you.
enjoy,
Chris
On Fri, Apr 25, 2008 at 9:34 PM, Anthony Mills <am...@gascard.net> wrote:
> I read most of the documentation, wiki and blogs, but I still do not see how
> to accomplish a certain scenario. Hopefully I can describe it adiquitely.
>
> Lets say I have 1,000,000 documents [all of the same "type"] with a date
> attribute. Lets say I want to pick a subset of those documents. How can I
> pick those documents of one type that fall on one day? Will I need to get
> all 1,000,000 documents? What if I want all documents of one type on one
> day that match another attribute?
>
> I pretty sure this is what map/reduce will help with, but is there a way to
> do this now? Can you use more documents to build date relations?
>
> Also, can you pass more variables than just key to the view function?
>
> Thank you,
>
> Anthony
>
--
Chris Anderson
http://jchris.mfdz.com