You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Nicholas Retallack <ni...@gmail.com> on 2008/08/19 08:27:12 UTC

Reduce is Really Slow!

You may have noticed my old post talking about reduce's weird behavior.
Well, I just tried out a reduce technique on a relatively large database,
and it seems quirks are the least of our problems.

This is the view I used: http://www.friendpaste.com/2AHz3ahr

My database is 78.3MB and contains 77604 documents, about 34000 of which
satisfy doc.Type == 'offer'.  I will do some queries with &count=1 to
minimize download time as a factor.

A normal query on this view takes 4.5 seconds.  That's pretty bad, since I
wanted to use this for a web interface.  If I add group=true&group_level=1
to my query string, it jumps up to 25 seconds!  If I remove the reduce
method from my view, queries take only 44 milliseconds.

I could not have predicted this big a performance problem just from using
reduce.  This makes it pretty unusable as a web interface.  In fact, I might
be better off doing the reduce operation myself in the application layer,
which is not very awesome at all.

Poking further, I see a huge performance problem even with the simplest of
reduce functions, like function(keys,values){return values} for example.  I
couldn't get a benchmark on this one because it took so long to run that
couchdb timed out and killed it.
> curl -X GET http://localhost:5984/clickfund/_view/offers/index
{"error":"error","reason":"{{nocatch,{map_process_error,\"map function timed
out\"}},\n [{couch_query_servers,readline,2},\n
 {couch_query_servers,read_json,1},\n  {couch_query_servers,prompt,2},\n
 {couch_query_servers,'-rereduce\/3-fun-0-',3},\n  {lists,zipwith,3},\n
 {couch_query_servers,rereduce,3},\n
 {couch_view,'-init_group\/4-fun-0-',4},\n
 {couch_btree,'-write_node\/3-lc$^0\/1-0-',3}]}"}

I am using couchdbx 0.8.0  I have been told that 0.8.1 improves on
javascript views, but I was unable to get it working on my mac.  It compiles
and runs, but when I try to connect to it I am refused.  This is strange,
because Couchdbx works fine when I run it on the same port.

> couchdb
Apache CouchDB 0.9.0a686685-incubating (LogLevel=info)
Apache CouchDB is starting.

Config Info /usr/local/etc/couchdb/couch.ini:
CurrentWorkingDir=/Users/nick/hooraycouch
DbRootDir=/usr/local/var/lib/couchdb
BindAddress="127.0.0.1"
Port="5984"
DocumentRoot=/usr/local/share/couchdb/www
LogFile=/usr/local/var/log/couchdb/couch.log
UtilDriverDir=/usr/local/lib/couchdb/erlang/lib/couch-0.9.0a686685-incubating/priv/lib
DbUpdateNotificationProcesses=
FullTextSearchQueryServer=
javascript=/usr/local/bin/couchjs /usr/local/share/couchdb/server/main.js

(on another terminal)
> curl -X GET http://localhost:5984/_all_dbs
curl: (7) couldn't connect to host

Re: Reduce is Really Slow!

Posted by Nicholas Retallack <ni...@gmail.com>.
The reason I had multiple documents with the same name, sorted by
date, is that I'm trying to do something like revisions here.  I get a
brand new data set each day and import it into the database, and I
want to be able to draw a sparkline indicating how each integer field
has changed over time.

It takes a lot of imports, and it seems much faster to post than to
update.  Also, it's nice to know when each data set arrived, and to be
able to say with some certainty that a particular number is associated
with a particular date.  I guess there are different ways to track
this though.

On Wed, Aug 20, 2008 at 4:49 PM, Paul Davis <pa...@gmail.com> wrote:
> Hermm. Yeah, nothing comes to mind other that's better than the
> emit(doc.name, null) and in your reduce return 1; and then query with
> group=true.
>
> The way you remember what's 100 documents ahead is by getting 101 and
> storing the id somewhere so you know where to start. Jumping into the
> middle of a result set requires you to know which key you want to
> start from. There's also a trick for going backwards in that the count
> number of results can be negative in which case it grabs the count
> preceding rows (that one suprised me alot, but it makes sense when you
> understand the implementation).
>
> Also, I'm not sure what you mean by cache. I haven't implemented
> pagination yet, but in your case it'd be something like:
>
> key_range = fetch from doc_name_view with group=true from
> startkey=blah, count=101
> docs fetch from doc_values_view with startkey=key_range[0] and
> endkey=key_range[-2]
> display docs
> display pagination controls using doc_range[0] and doc_range[-1]
>
> (I think)
>
> Not sure about tag clouds. Academically that sounds like a reduce
> function, but it'd depend on how you do things. Something like for(tag
> in doc.tags) {emit(tag, 1);}, and your reduce is a simple
> function(keys, values) { return sum(values);}
>
> Sorry about the some_value_that_sorts_after_last_possible_date. I was
> too lazy to lookup what JSON value would sort after an integer or
> string.
>
> Also, you may reconsider your document design. Obviously I have no
> idea what your data looks like, but perhaps instead of adding multiple
> docs with the same doc.name, you just update the orignal doc to have
> array keys. Not always possible, but its a thought.
>
> HTH,
> Paul
>
> On Wed, Aug 20, 2008 at 5:45 PM, Nicholas Retallack
> <ni...@gmail.com> wrote:
>> Oh clever.  I was considering a solution like this, but I was worried
>> I wouldn't know where to stop, and might end up chopping it between
>> some documents that should be grouped together.
>> "some_value_that_sorts_after_last_possible_date" solves that problem.
>> There's another problem though, for when I want to do pagination.  Say
>> I want to display exactly 100 of these on a page.  How do I know I've
>> fetched 100 of them, if any number of documents could be in a group?
>> Also, how would I know what document name appears 100 documents ahead
>> of this one?  This gets messy...
>>
>> Essentially I figured this should be a task the database is capable of
>> doing on its own.  I don't want every action in my web application to
>> have to solve the caching problem on its own, after doing serious
>> data-munging on all this ugly stuff I got back from the database.  How
>> do I know when the cache should be invalidated anyway, without insider
>> knowledge from the database?
>>
>> Hm, cleverness.  I guess I could figure out what every hundredth name
>> is by making a view for just the names and querying that.  Any
>> efficient way to reduce that list for uniqueness?  Perhaps group=true
>> and reduce = function(){return true}.  There should be a wiki page
>> devoted to these silly tricks, like this hackish way to put together
>> pagination.  And tag clouds.
>>
>> On Wed, Aug 20, 2008 at 1:56 PM, Paul Davis <pa...@gmail.com> wrote:
>>> If I'm not mistaken, you have a number of documents that all have a
>>> given 'name'. And you want the list of elements for each value of
>>> 'name'. To accomplish this in db, land, you could use a design
>>> document like [1].
>>>
>>> Then to get the data for any given doc name, you'd query your map like
>>> [2]. This gets you everything emitted with a given doc name. The
>>> underlying idea to remember in getting data out of couch is that your
>>> maps should emit things that sort together. Then you can use 'slice'
>>> operations to pull at the documents you need.
>>>
>>> You're values aren't magically in one array, but merging the arrays in
>>> app-land is easy enough.
>>>
>>> If I've completely screwed up what you were going after, let me know.
>>>
>>> [1] http://www.friendpaste.com/2AHz3ahr
>>> [2] http://localhost:5984/dbname/_view/design_docid/index?startkey=["docname"]&endkey=["docname",
>>> some_value_that_sorts_after_last_possible_date]
>>>
>>> Paul
>>>
>>> On Wed, Aug 20, 2008 at 4:32 PM, Nicholas Retallack
>>> <ni...@gmail.com> wrote:
>>>> Replacing 'return values' with 'return values.length' shows you're
>>>> right.  4 minutes for the first query, miliseconds afterward, as
>>>> opposed to forever.
>>>>
>>>> I guess I was expecting reduce to do things it wasn't designed to do.
>>>> I notice ?group=true&group_level=1 is ignored unless a reduce function
>>>> of some sort exists though.  Is there any way to get this grouping
>>>> behavior without such extreme reductions in result size / performance?
>>>>
>>>> The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
>>>> designed to simply take each document with the same name and merge
>>>> them into one document, turning same-named fields into lists (here's a
>>>> more general version http://www.friendpaste.com/Ud6ELaXC).  This
>>>> reduces the document size, but only by whatever overhead the repeated
>>>> field names would add.  The fields I was reducing only contained
>>>> integers, so reduction did shrink documents by quite a bit.  It was
>>>> pretty handy, but the query took 25 seconds to return one result even
>>>> when called repeatedly.
>>>>
>>>> Is there some technical reason for this limitation?
>>>>
>>>> I had assumed reduce was just an ordinary post-processing step that I
>>>> could run once and have something akin to a brand new generated table
>>>> to query on, so I wrote my views to transform my data to fit the
>>>> various ways I wanted to view it.  It worked fine for small amounts of
>>>> data in little experiments, but as soon as I used it on my real
>>>> database, I hit this wall.
>>>>
>>>> Are there plans to make reduce work for these more general
>>>> data-mangling tasks?  Or should I be approaching the problem a
>>>> different way?  Perhaps write my map calls differently so they produce
>>>> more rows for reduce to compact?  Or do something special if the third
>>>> parameter to reduce is true?
>>>>
>>>> On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <da...@apache.org> wrote:
>>>>> You can return arrays and objects, whatever json allows. But if the object
>>>>> keeps getting bigger the more rows it reduces, then it simply won't work.
>>>>>
>>>>> The exception is that the size of the reduce value can be logarithmic with
>>>>> respect to the rows. The simplest example of logarithmic growth is the
>>>>> summing of a row value. With Erlangs bignums, the size on disk is
>>>>> Log2(Sum(Rows)), which is perfectly acceptable growth.
>>>>>
>>>>> -Damien
>>>>>
>>>>> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>>>>>
>>>>>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>>>>>> attempt to run the simplest test possible on my data.  But hey, values is
>>>>>> an
>>>>>> array.  Does that mean you're not allowed to return objects like arrays
>>>>>> from
>>>>>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>>>>>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>>>>>> kicking in?
>>>>>>
>>>>>> For the record, couchdb is still working on this query I started hours
>>>>>> ago,
>>>>>> and chewing up all my cpu.  I am going to have to kill it so I can get
>>>>>> some
>>>>>> work done.
>>>>>>
>>>>>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:
>>>>>>
>>>>>>> I think the problem with your reduce is that it looks like its not
>>>>>>> actually
>>>>>>> reducing to a single value, but instead using reduce for grouping data.
>>>>>>> That
>>>>>>> will cause severe performance problems.
>>>>>>>
>>>>>>> For reduce to work properly, you should end up with a fixed size data
>>>>>>> structure regardless of the number of values being reduced (not stricty
>>>>>>> true, but that's the general rule).
>>>>>>>
>>>>>>> -Damien
>>>>>>>
>>>>>>>
>>>>>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>>>>>
>>>>>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>>>>>
>>>>>>>> issues with reduce.
>>>>>>>>
>>>>>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>>>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>>>>>
>>>>>>>> Here's a query I tried to do:
>>>>>>>>
>>>>>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>>>>>> not
>>>>>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>>>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>>>>>> a
>>>>>>>> simple function(keys,values){return values}.
>>>>>>>>
>>>>>>>> Here's the query call:
>>>>>>>> time curl -X GET '
>>>>>>>>
>>>>>>>>
>>>>>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>>>>>> '
>>>>>>>>
>>>>>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>>>>>
>>>>>>>> I'd love to give you this command's execution time, since I ran it last
>>>>>>>> night before I went to bed, but it must have taken over an hour because
>>>>>>>> my
>>>>>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>>>>>
>>>>>>>> Considering it's blazing fast without the reduce function, I can only
>>>>>>>> assume
>>>>>>>> what's taking all this time is overhead setting up and tearing down the
>>>>>>>> simple function(keys,values){return values}.
>>>>>>>>
>>>>>>>> I can give you guys the python source to set up this database so you can
>>>>>>>> try
>>>>>>>> it yourself if you like.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reduce is Really Slow!

Posted by Paul Davis <pa...@gmail.com>.
Hermm. Yeah, nothing comes to mind other that's better than the
emit(doc.name, null) and in your reduce return 1; and then query with
group=true.

The way you remember what's 100 documents ahead is by getting 101 and
storing the id somewhere so you know where to start. Jumping into the
middle of a result set requires you to know which key you want to
start from. There's also a trick for going backwards in that the count
number of results can be negative in which case it grabs the count
preceding rows (that one suprised me alot, but it makes sense when you
understand the implementation).

Also, I'm not sure what you mean by cache. I haven't implemented
pagination yet, but in your case it'd be something like:

key_range = fetch from doc_name_view with group=true from
startkey=blah, count=101
docs fetch from doc_values_view with startkey=key_range[0] and
endkey=key_range[-2]
display docs
display pagination controls using doc_range[0] and doc_range[-1]

(I think)

Not sure about tag clouds. Academically that sounds like a reduce
function, but it'd depend on how you do things. Something like for(tag
in doc.tags) {emit(tag, 1);}, and your reduce is a simple
function(keys, values) { return sum(values);}

Sorry about the some_value_that_sorts_after_last_possible_date. I was
too lazy to lookup what JSON value would sort after an integer or
string.

Also, you may reconsider your document design. Obviously I have no
idea what your data looks like, but perhaps instead of adding multiple
docs with the same doc.name, you just update the orignal doc to have
array keys. Not always possible, but its a thought.

HTH,
Paul

On Wed, Aug 20, 2008 at 5:45 PM, Nicholas Retallack
<ni...@gmail.com> wrote:
> Oh clever.  I was considering a solution like this, but I was worried
> I wouldn't know where to stop, and might end up chopping it between
> some documents that should be grouped together.
> "some_value_that_sorts_after_last_possible_date" solves that problem.
> There's another problem though, for when I want to do pagination.  Say
> I want to display exactly 100 of these on a page.  How do I know I've
> fetched 100 of them, if any number of documents could be in a group?
> Also, how would I know what document name appears 100 documents ahead
> of this one?  This gets messy...
>
> Essentially I figured this should be a task the database is capable of
> doing on its own.  I don't want every action in my web application to
> have to solve the caching problem on its own, after doing serious
> data-munging on all this ugly stuff I got back from the database.  How
> do I know when the cache should be invalidated anyway, without insider
> knowledge from the database?
>
> Hm, cleverness.  I guess I could figure out what every hundredth name
> is by making a view for just the names and querying that.  Any
> efficient way to reduce that list for uniqueness?  Perhaps group=true
> and reduce = function(){return true}.  There should be a wiki page
> devoted to these silly tricks, like this hackish way to put together
> pagination.  And tag clouds.
>
> On Wed, Aug 20, 2008 at 1:56 PM, Paul Davis <pa...@gmail.com> wrote:
>> If I'm not mistaken, you have a number of documents that all have a
>> given 'name'. And you want the list of elements for each value of
>> 'name'. To accomplish this in db, land, you could use a design
>> document like [1].
>>
>> Then to get the data for any given doc name, you'd query your map like
>> [2]. This gets you everything emitted with a given doc name. The
>> underlying idea to remember in getting data out of couch is that your
>> maps should emit things that sort together. Then you can use 'slice'
>> operations to pull at the documents you need.
>>
>> You're values aren't magically in one array, but merging the arrays in
>> app-land is easy enough.
>>
>> If I've completely screwed up what you were going after, let me know.
>>
>> [1] http://www.friendpaste.com/2AHz3ahr
>> [2] http://localhost:5984/dbname/_view/design_docid/index?startkey=["docname"]&endkey=["docname",
>> some_value_that_sorts_after_last_possible_date]
>>
>> Paul
>>
>> On Wed, Aug 20, 2008 at 4:32 PM, Nicholas Retallack
>> <ni...@gmail.com> wrote:
>>> Replacing 'return values' with 'return values.length' shows you're
>>> right.  4 minutes for the first query, miliseconds afterward, as
>>> opposed to forever.
>>>
>>> I guess I was expecting reduce to do things it wasn't designed to do.
>>> I notice ?group=true&group_level=1 is ignored unless a reduce function
>>> of some sort exists though.  Is there any way to get this grouping
>>> behavior without such extreme reductions in result size / performance?
>>>
>>> The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
>>> designed to simply take each document with the same name and merge
>>> them into one document, turning same-named fields into lists (here's a
>>> more general version http://www.friendpaste.com/Ud6ELaXC).  This
>>> reduces the document size, but only by whatever overhead the repeated
>>> field names would add.  The fields I was reducing only contained
>>> integers, so reduction did shrink documents by quite a bit.  It was
>>> pretty handy, but the query took 25 seconds to return one result even
>>> when called repeatedly.
>>>
>>> Is there some technical reason for this limitation?
>>>
>>> I had assumed reduce was just an ordinary post-processing step that I
>>> could run once and have something akin to a brand new generated table
>>> to query on, so I wrote my views to transform my data to fit the
>>> various ways I wanted to view it.  It worked fine for small amounts of
>>> data in little experiments, but as soon as I used it on my real
>>> database, I hit this wall.
>>>
>>> Are there plans to make reduce work for these more general
>>> data-mangling tasks?  Or should I be approaching the problem a
>>> different way?  Perhaps write my map calls differently so they produce
>>> more rows for reduce to compact?  Or do something special if the third
>>> parameter to reduce is true?
>>>
>>> On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <da...@apache.org> wrote:
>>>> You can return arrays and objects, whatever json allows. But if the object
>>>> keeps getting bigger the more rows it reduces, then it simply won't work.
>>>>
>>>> The exception is that the size of the reduce value can be logarithmic with
>>>> respect to the rows. The simplest example of logarithmic growth is the
>>>> summing of a row value. With Erlangs bignums, the size on disk is
>>>> Log2(Sum(Rows)), which is perfectly acceptable growth.
>>>>
>>>> -Damien
>>>>
>>>> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>>>>
>>>>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>>>>> attempt to run the simplest test possible on my data.  But hey, values is
>>>>> an
>>>>> array.  Does that mean you're not allowed to return objects like arrays
>>>>> from
>>>>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>>>>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>>>>> kicking in?
>>>>>
>>>>> For the record, couchdb is still working on this query I started hours
>>>>> ago,
>>>>> and chewing up all my cpu.  I am going to have to kill it so I can get
>>>>> some
>>>>> work done.
>>>>>
>>>>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:
>>>>>
>>>>>> I think the problem with your reduce is that it looks like its not
>>>>>> actually
>>>>>> reducing to a single value, but instead using reduce for grouping data.
>>>>>> That
>>>>>> will cause severe performance problems.
>>>>>>
>>>>>> For reduce to work properly, you should end up with a fixed size data
>>>>>> structure regardless of the number of values being reduced (not stricty
>>>>>> true, but that's the general rule).
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>>
>>>>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>>>>
>>>>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>>>>
>>>>>>> issues with reduce.
>>>>>>>
>>>>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>>>>
>>>>>>> Here's a query I tried to do:
>>>>>>>
>>>>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>>>>> not
>>>>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>>>>> a
>>>>>>> simple function(keys,values){return values}.
>>>>>>>
>>>>>>> Here's the query call:
>>>>>>> time curl -X GET '
>>>>>>>
>>>>>>>
>>>>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>>>>> '
>>>>>>>
>>>>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>>>>
>>>>>>> I'd love to give you this command's execution time, since I ran it last
>>>>>>> night before I went to bed, but it must have taken over an hour because
>>>>>>> my
>>>>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>>>>
>>>>>>> Considering it's blazing fast without the reduce function, I can only
>>>>>>> assume
>>>>>>> what's taking all this time is overhead setting up and tearing down the
>>>>>>> simple function(keys,values){return values}.
>>>>>>>
>>>>>>> I can give you guys the python source to set up this database so you can
>>>>>>> try
>>>>>>> it yourself if you like.
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>

Re: Reduce is Really Slow!

Posted by Nicholas Retallack <ni...@gmail.com>.
Oh clever.  I was considering a solution like this, but I was worried
I wouldn't know where to stop, and might end up chopping it between
some documents that should be grouped together.
"some_value_that_sorts_after_last_possible_date" solves that problem.
There's another problem though, for when I want to do pagination.  Say
I want to display exactly 100 of these on a page.  How do I know I've
fetched 100 of them, if any number of documents could be in a group?
Also, how would I know what document name appears 100 documents ahead
of this one?  This gets messy...

Essentially I figured this should be a task the database is capable of
doing on its own.  I don't want every action in my web application to
have to solve the caching problem on its own, after doing serious
data-munging on all this ugly stuff I got back from the database.  How
do I know when the cache should be invalidated anyway, without insider
knowledge from the database?

Hm, cleverness.  I guess I could figure out what every hundredth name
is by making a view for just the names and querying that.  Any
efficient way to reduce that list for uniqueness?  Perhaps group=true
and reduce = function(){return true}.  There should be a wiki page
devoted to these silly tricks, like this hackish way to put together
pagination.  And tag clouds.

On Wed, Aug 20, 2008 at 1:56 PM, Paul Davis <pa...@gmail.com> wrote:
> If I'm not mistaken, you have a number of documents that all have a
> given 'name'. And you want the list of elements for each value of
> 'name'. To accomplish this in db, land, you could use a design
> document like [1].
>
> Then to get the data for any given doc name, you'd query your map like
> [2]. This gets you everything emitted with a given doc name. The
> underlying idea to remember in getting data out of couch is that your
> maps should emit things that sort together. Then you can use 'slice'
> operations to pull at the documents you need.
>
> You're values aren't magically in one array, but merging the arrays in
> app-land is easy enough.
>
> If I've completely screwed up what you were going after, let me know.
>
> [1] http://www.friendpaste.com/2AHz3ahr
> [2] http://localhost:5984/dbname/_view/design_docid/index?startkey=["docname"]&endkey=["docname",
> some_value_that_sorts_after_last_possible_date]
>
> Paul
>
> On Wed, Aug 20, 2008 at 4:32 PM, Nicholas Retallack
> <ni...@gmail.com> wrote:
>> Replacing 'return values' with 'return values.length' shows you're
>> right.  4 minutes for the first query, miliseconds afterward, as
>> opposed to forever.
>>
>> I guess I was expecting reduce to do things it wasn't designed to do.
>> I notice ?group=true&group_level=1 is ignored unless a reduce function
>> of some sort exists though.  Is there any way to get this grouping
>> behavior without such extreme reductions in result size / performance?
>>
>> The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
>> designed to simply take each document with the same name and merge
>> them into one document, turning same-named fields into lists (here's a
>> more general version http://www.friendpaste.com/Ud6ELaXC).  This
>> reduces the document size, but only by whatever overhead the repeated
>> field names would add.  The fields I was reducing only contained
>> integers, so reduction did shrink documents by quite a bit.  It was
>> pretty handy, but the query took 25 seconds to return one result even
>> when called repeatedly.
>>
>> Is there some technical reason for this limitation?
>>
>> I had assumed reduce was just an ordinary post-processing step that I
>> could run once and have something akin to a brand new generated table
>> to query on, so I wrote my views to transform my data to fit the
>> various ways I wanted to view it.  It worked fine for small amounts of
>> data in little experiments, but as soon as I used it on my real
>> database, I hit this wall.
>>
>> Are there plans to make reduce work for these more general
>> data-mangling tasks?  Or should I be approaching the problem a
>> different way?  Perhaps write my map calls differently so they produce
>> more rows for reduce to compact?  Or do something special if the third
>> parameter to reduce is true?
>>
>> On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <da...@apache.org> wrote:
>>> You can return arrays and objects, whatever json allows. But if the object
>>> keeps getting bigger the more rows it reduces, then it simply won't work.
>>>
>>> The exception is that the size of the reduce value can be logarithmic with
>>> respect to the rows. The simplest example of logarithmic growth is the
>>> summing of a row value. With Erlangs bignums, the size on disk is
>>> Log2(Sum(Rows)), which is perfectly acceptable growth.
>>>
>>> -Damien
>>>
>>> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>>>
>>>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>>>> attempt to run the simplest test possible on my data.  But hey, values is
>>>> an
>>>> array.  Does that mean you're not allowed to return objects like arrays
>>>> from
>>>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>>>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>>>> kicking in?
>>>>
>>>> For the record, couchdb is still working on this query I started hours
>>>> ago,
>>>> and chewing up all my cpu.  I am going to have to kill it so I can get
>>>> some
>>>> work done.
>>>>
>>>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:
>>>>
>>>>> I think the problem with your reduce is that it looks like its not
>>>>> actually
>>>>> reducing to a single value, but instead using reduce for grouping data.
>>>>> That
>>>>> will cause severe performance problems.
>>>>>
>>>>> For reduce to work properly, you should end up with a fixed size data
>>>>> structure regardless of the number of values being reduced (not stricty
>>>>> true, but that's the general rule).
>>>>>
>>>>> -Damien
>>>>>
>>>>>
>>>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>>>
>>>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>>>
>>>>>> issues with reduce.
>>>>>>
>>>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>>>
>>>>>> Here's a query I tried to do:
>>>>>>
>>>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>>>> not
>>>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>>>> a
>>>>>> simple function(keys,values){return values}.
>>>>>>
>>>>>> Here's the query call:
>>>>>> time curl -X GET '
>>>>>>
>>>>>>
>>>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>>>> '
>>>>>>
>>>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>>>
>>>>>> I'd love to give you this command's execution time, since I ran it last
>>>>>> night before I went to bed, but it must have taken over an hour because
>>>>>> my
>>>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>>>
>>>>>> Considering it's blazing fast without the reduce function, I can only
>>>>>> assume
>>>>>> what's taking all this time is overhead setting up and tearing down the
>>>>>> simple function(keys,values){return values}.
>>>>>>
>>>>>> I can give you guys the python source to set up this database so you can
>>>>>> try
>>>>>> it yourself if you like.
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>
>

Re: Reduce is Really Slow!

Posted by Paul Davis <pa...@gmail.com>.
If I'm not mistaken, you have a number of documents that all have a
given 'name'. And you want the list of elements for each value of
'name'. To accomplish this in db, land, you could use a design
document like [1].

Then to get the data for any given doc name, you'd query your map like
[2]. This gets you everything emitted with a given doc name. The
underlying idea to remember in getting data out of couch is that your
maps should emit things that sort together. Then you can use 'slice'
operations to pull at the documents you need.

You're values aren't magically in one array, but merging the arrays in
app-land is easy enough.

If I've completely screwed up what you were going after, let me know.

[1] http://www.friendpaste.com/2AHz3ahr
[2] http://localhost:5984/dbname/_view/design_docid/index?startkey=["docname"]&endkey=["docname",
some_value_that_sorts_after_last_possible_date]

Paul

On Wed, Aug 20, 2008 at 4:32 PM, Nicholas Retallack
<ni...@gmail.com> wrote:
> Replacing 'return values' with 'return values.length' shows you're
> right.  4 minutes for the first query, miliseconds afterward, as
> opposed to forever.
>
> I guess I was expecting reduce to do things it wasn't designed to do.
> I notice ?group=true&group_level=1 is ignored unless a reduce function
> of some sort exists though.  Is there any way to get this grouping
> behavior without such extreme reductions in result size / performance?
>
> The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
> designed to simply take each document with the same name and merge
> them into one document, turning same-named fields into lists (here's a
> more general version http://www.friendpaste.com/Ud6ELaXC).  This
> reduces the document size, but only by whatever overhead the repeated
> field names would add.  The fields I was reducing only contained
> integers, so reduction did shrink documents by quite a bit.  It was
> pretty handy, but the query took 25 seconds to return one result even
> when called repeatedly.
>
> Is there some technical reason for this limitation?
>
> I had assumed reduce was just an ordinary post-processing step that I
> could run once and have something akin to a brand new generated table
> to query on, so I wrote my views to transform my data to fit the
> various ways I wanted to view it.  It worked fine for small amounts of
> data in little experiments, but as soon as I used it on my real
> database, I hit this wall.
>
> Are there plans to make reduce work for these more general
> data-mangling tasks?  Or should I be approaching the problem a
> different way?  Perhaps write my map calls differently so they produce
> more rows for reduce to compact?  Or do something special if the third
> parameter to reduce is true?
>
> On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <da...@apache.org> wrote:
>> You can return arrays and objects, whatever json allows. But if the object
>> keeps getting bigger the more rows it reduces, then it simply won't work.
>>
>> The exception is that the size of the reduce value can be logarithmic with
>> respect to the rows. The simplest example of logarithmic growth is the
>> summing of a row value. With Erlangs bignums, the size on disk is
>> Log2(Sum(Rows)), which is perfectly acceptable growth.
>>
>> -Damien
>>
>> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>>
>>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>>> attempt to run the simplest test possible on my data.  But hey, values is
>>> an
>>> array.  Does that mean you're not allowed to return objects like arrays
>>> from
>>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>>> kicking in?
>>>
>>> For the record, couchdb is still working on this query I started hours
>>> ago,
>>> and chewing up all my cpu.  I am going to have to kill it so I can get
>>> some
>>> work done.
>>>
>>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:
>>>
>>>> I think the problem with your reduce is that it looks like its not
>>>> actually
>>>> reducing to a single value, but instead using reduce for grouping data.
>>>> That
>>>> will cause severe performance problems.
>>>>
>>>> For reduce to work properly, you should end up with a fixed size data
>>>> structure regardless of the number of values being reduced (not stricty
>>>> true, but that's the general rule).
>>>>
>>>> -Damien
>>>>
>>>>
>>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>>
>>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>>
>>>>> issues with reduce.
>>>>>
>>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>>
>>>>> Here's a query I tried to do:
>>>>>
>>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>>> not
>>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>>> a
>>>>> simple function(keys,values){return values}.
>>>>>
>>>>> Here's the query call:
>>>>> time curl -X GET '
>>>>>
>>>>>
>>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>>> '
>>>>>
>>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>>
>>>>> I'd love to give you this command's execution time, since I ran it last
>>>>> night before I went to bed, but it must have taken over an hour because
>>>>> my
>>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>>
>>>>> Considering it's blazing fast without the reduce function, I can only
>>>>> assume
>>>>> what's taking all this time is overhead setting up and tearing down the
>>>>> simple function(keys,values){return values}.
>>>>>
>>>>> I can give you guys the python source to set up this database so you can
>>>>> try
>>>>> it yourself if you like.
>>>>>
>>>>
>>>>
>>
>>
>

Re: Reduce is Really Slow!

Posted by Chris Anderson <jc...@grabb.it>.
Paul's advice is right on - if you can get the data using a range
query on a map view (without reduce), you should do that - if you need
aggregation of very many rows into a short value, reduce is your
friend.

On Wed, Aug 20, 2008 at 1:32 PM, Nicholas Retallack
<ni...@gmail.com> wrote:
> Replacing 'return values' with 'return values.length' shows you're
> right.  4 minutes for the first query, miliseconds afterward, as
> opposed to forever.
>

That sounds like the query times I'm getting.

>
> Are there plans to make reduce work for these more general
> data-mangling tasks?  Or should I be approaching the problem a
> different way?  Perhaps write my map calls differently so they produce
> more rows for reduce to compact?  Or do something special if the third
> parameter to reduce is true?
>

"Plans" would be a strong term, but I've been digging through the
source lately thinking about ways to make a more Hadoop-like map
process. I've prototyped remap in Ruby
http://github.com/jchris/couchrest/tree/master/utils/remap.rb

The driving use case is a list of URLs, as output from a view, that
are each fetched by the view server (robots.txt etc etc), with the
fetched results stored as new documents. Essentially a Nutch
implementation backed by CouchDB.

Of course this could be an application process running against the
HTTP API, but CouchDB's view-server plugin architecture could make
managing data even easier than Hadoop does.

I've got my crazy idea hat on, so don't expect to see this in trunk soon. ;)

Chris


-- 
Chris Anderson
http://jchris.mfdz.com

Re: Reduce is Really Slow!

Posted by Nicholas Retallack <ni...@gmail.com>.
Replacing 'return values' with 'return values.length' shows you're
right.  4 minutes for the first query, miliseconds afterward, as
opposed to forever.

I guess I was expecting reduce to do things it wasn't designed to do.
I notice ?group=true&group_level=1 is ignored unless a reduce function
of some sort exists though.  Is there any way to get this grouping
behavior without such extreme reductions in result size / performance?

The view I was using here (http://www.friendpaste.com/2AHz3ahr) was
designed to simply take each document with the same name and merge
them into one document, turning same-named fields into lists (here's a
more general version http://www.friendpaste.com/Ud6ELaXC).  This
reduces the document size, but only by whatever overhead the repeated
field names would add.  The fields I was reducing only contained
integers, so reduction did shrink documents by quite a bit.  It was
pretty handy, but the query took 25 seconds to return one result even
when called repeatedly.

Is there some technical reason for this limitation?

I had assumed reduce was just an ordinary post-processing step that I
could run once and have something akin to a brand new generated table
to query on, so I wrote my views to transform my data to fit the
various ways I wanted to view it.  It worked fine for small amounts of
data in little experiments, but as soon as I used it on my real
database, I hit this wall.

Are there plans to make reduce work for these more general
data-mangling tasks?  Or should I be approaching the problem a
different way?  Perhaps write my map calls differently so they produce
more rows for reduce to compact?  Or do something special if the third
parameter to reduce is true?

On Tue, Aug 19, 2008 at 5:41 PM, Damien Katz <da...@apache.org> wrote:
> You can return arrays and objects, whatever json allows. But if the object
> keeps getting bigger the more rows it reduces, then it simply won't work.
>
> The exception is that the size of the reduce value can be logarithmic with
> respect to the rows. The simplest example of logarithmic growth is the
> summing of a row value. With Erlangs bignums, the size on disk is
> Log2(Sum(Rows)), which is perfectly acceptable growth.
>
> -Damien
>
> On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:
>
>> Oh!  I didn't realize that was a rule.  I had used 'return values' in
>> attempt to run the simplest test possible on my data.  But hey, values is
>> an
>> array.  Does that mean you're not allowed to return objects like arrays
>> from
>> reduce at all?  Because I was kind of hoping I could.  I was able to do it
>> with smaller amounts of data, after all.  Perhaps this is due to re-reduce
>> kicking in?
>>
>> For the record, couchdb is still working on this query I started hours
>> ago,
>> and chewing up all my cpu.  I am going to have to kill it so I can get
>> some
>> work done.
>>
>> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:
>>
>>> I think the problem with your reduce is that it looks like its not
>>> actually
>>> reducing to a single value, but instead using reduce for grouping data.
>>> That
>>> will cause severe performance problems.
>>>
>>> For reduce to work properly, you should end up with a fixed size data
>>> structure regardless of the number of values being reduced (not stricty
>>> true, but that's the general rule).
>>>
>>> -Damien
>>>
>>>
>>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>>
>>> Okay, I got it built on gentoo instead, but I'm still having performance
>>>>
>>>> issues with reduce.
>>>>
>>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>>
>>>> Here's a query I tried to do:
>>>>
>>>> I freshly imported about 191MB of data in 155399 documents.  29090 are
>>>> not
>>>> discarded by map.  Map produces one row with 5 fields for each of these
>>>> documents.  After grouping, each group should have four rows.  Reduce is
>>>> a
>>>> simple function(keys,values){return values}.
>>>>
>>>> Here's the query call:
>>>> time curl -X GET '
>>>>
>>>>
>>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>>> '
>>>>
>>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>>
>>>> I'd love to give you this command's execution time, since I ran it last
>>>> night before I went to bed, but it must have taken over an hour because
>>>> my
>>>> laptop went to sleep and severed the connection.  Trying it again.
>>>>
>>>> Considering it's blazing fast without the reduce function, I can only
>>>> assume
>>>> what's taking all this time is overhead setting up and tearing down the
>>>> simple function(keys,values){return values}.
>>>>
>>>> I can give you guys the python source to set up this database so you can
>>>> try
>>>> it yourself if you like.
>>>>
>>>
>>>
>
>

Re: Reduce is Really Slow!

Posted by Damien Katz <da...@apache.org>.
You can return arrays and objects, whatever json allows. But if the  
object keeps getting bigger the more rows it reduces, then it simply  
won't work.

The exception is that the size of the reduce value can be logarithmic  
with respect to the rows. The simplest example of logarithmic growth  
is the summing of a row value. With Erlangs bignums, the size on disk  
is Log2(Sum(Rows)), which is perfectly acceptable growth.

-Damien

On Aug 19, 2008, at 8:14 PM, Nicholas Retallack wrote:

> Oh!  I didn't realize that was a rule.  I had used 'return values' in
> attempt to run the simplest test possible on my data.  But hey,  
> values is an
> array.  Does that mean you're not allowed to return objects like  
> arrays from
> reduce at all?  Because I was kind of hoping I could.  I was able to  
> do it
> with smaller amounts of data, after all.  Perhaps this is due to re- 
> reduce
> kicking in?
>
> For the record, couchdb is still working on this query I started  
> hours ago,
> and chewing up all my cpu.  I am going to have to kill it so I can  
> get some
> work done.
>
> On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org>  
> wrote:
>
>> I think the problem with your reduce is that it looks like its not  
>> actually
>> reducing to a single value, but instead using reduce for grouping  
>> data. That
>> will cause severe performance problems.
>>
>> For reduce to work properly, you should end up with a fixed size data
>> structure regardless of the number of values being reduced (not  
>> stricty
>> true, but that's the general rule).
>>
>> -Damien
>>
>>
>> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>>
>> Okay, I got it built on gentoo instead, but I'm still having  
>> performance
>>> issues with reduce.
>>>
>>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async- 
>>> threads:0]
>>> couchdb - Apache CouchDB 0.8.1-incubating
>>>
>>> Here's a query I tried to do:
>>>
>>> I freshly imported about 191MB of data in 155399 documents.  29090  
>>> are not
>>> discarded by map.  Map produces one row with 5 fields for each of  
>>> these
>>> documents.  After grouping, each group should have four rows.   
>>> Reduce is a
>>> simple function(keys,values){return values}.
>>>
>>> Here's the query call:
>>> time curl -X GET '
>>>
>>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>>> '
>>>
>>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>>
>>> I'd love to give you this command's execution time, since I ran it  
>>> last
>>> night before I went to bed, but it must have taken over an hour  
>>> because my
>>> laptop went to sleep and severed the connection.  Trying it again.
>>>
>>> Considering it's blazing fast without the reduce function, I can  
>>> only
>>> assume
>>> what's taking all this time is overhead setting up and tearing  
>>> down the
>>> simple function(keys,values){return values}.
>>>
>>> I can give you guys the python source to set up this database so  
>>> you can
>>> try
>>> it yourself if you like.
>>>
>>
>>


Re: Reduce is Really Slow!

Posted by Nicholas Retallack <ni...@gmail.com>.
Oh!  I didn't realize that was a rule.  I had used 'return values' in
attempt to run the simplest test possible on my data.  But hey, values is an
array.  Does that mean you're not allowed to return objects like arrays from
reduce at all?  Because I was kind of hoping I could.  I was able to do it
with smaller amounts of data, after all.  Perhaps this is due to re-reduce
kicking in?

For the record, couchdb is still working on this query I started hours ago,
and chewing up all my cpu.  I am going to have to kill it so I can get some
work done.

On Tue, Aug 19, 2008 at 4:21 PM, Damien Katz <da...@apache.org> wrote:

> I think the problem with your reduce is that it looks like its not actually
> reducing to a single value, but instead using reduce for grouping data. That
> will cause severe performance problems.
>
> For reduce to work properly, you should end up with a fixed size data
> structure regardless of the number of values being reduced (not stricty
> true, but that's the general rule).
>
> -Damien
>
>
> On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:
>
>  Okay, I got it built on gentoo instead, but I'm still having performance
>> issues with reduce.
>>
>> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
>> couchdb - Apache CouchDB 0.8.1-incubating
>>
>> Here's a query I tried to do:
>>
>> I freshly imported about 191MB of data in 155399 documents.  29090 are not
>> discarded by map.  Map produces one row with 5 fields for each of these
>> documents.  After grouping, each group should have four rows.  Reduce is a
>> simple function(keys,values){return values}.
>>
>> Here's the query call:
>> time curl -X GET '
>>
>> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
>> '
>>
>> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>>
>> I'd love to give you this command's execution time, since I ran it last
>> night before I went to bed, but it must have taken over an hour because my
>> laptop went to sleep and severed the connection.  Trying it again.
>>
>> Considering it's blazing fast without the reduce function, I can only
>> assume
>> what's taking all this time is overhead setting up and tearing down the
>> simple function(keys,values){return values}.
>>
>> I can give you guys the python source to set up this database so you can
>> try
>> it yourself if you like.
>>
>
>

Re: Reduce is Really Slow!

Posted by Damien Katz <da...@apache.org>.
I think the problem with your reduce is that it looks like its not  
actually reducing to a single value, but instead using reduce for  
grouping data. That will cause severe performance problems.

For reduce to work properly, you should end up with a fixed size data  
structure regardless of the number of values being reduced (not  
stricty true, but that's the general rule).

-Damien

On Aug 19, 2008, at 6:55 PM, Nicholas Retallack wrote:

> Okay, I got it built on gentoo instead, but I'm still having  
> performance
> issues with reduce.
>
> Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async- 
> threads:0]
> couchdb - Apache CouchDB 0.8.1-incubating
>
> Here's a query I tried to do:
>
> I freshly imported about 191MB of data in 155399 documents.  29090  
> are not
> discarded by map.  Map produces one row with 5 fields for each of  
> these
> documents.  After grouping, each group should have four rows.   
> Reduce is a
> simple function(keys,values){return values}.
>
> Here's the query call:
> time curl -X GET '
> http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
> '
>
> This is running on a 512MB slicehost account.  http://www.slicehost.com/
>
> I'd love to give you this command's execution time, since I ran it  
> last
> night before I went to bed, but it must have taken over an hour  
> because my
> laptop went to sleep and severed the connection.  Trying it again.
>
> Considering it's blazing fast without the reduce function, I can  
> only assume
> what's taking all this time is overhead setting up and tearing down  
> the
> simple function(keys,values){return values}.
>
> I can give you guys the python source to set up this database so you  
> can try
> it yourself if you like.


Re: Reduce is Really Slow!

Posted by Nicholas Retallack <ni...@gmail.com>.
Okay, I got it built on gentoo instead, but I'm still having performance
issues with reduce.

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [async-threads:0]
couchdb - Apache CouchDB 0.8.1-incubating

Here's a query I tried to do:

I freshly imported about 191MB of data in 155399 documents.  29090 are not
discarded by map.  Map produces one row with 5 fields for each of these
documents.  After grouping, each group should have four rows.  Reduce is a
simple function(keys,values){return values}.

Here's the query call:
time curl -X GET '
http://localhost:5984/clickfund/_view/offers/index?count=1&group=true&group_level=1
'

This is running on a 512MB slicehost account.  http://www.slicehost.com/

I'd love to give you this command's execution time, since I ran it last
night before I went to bed, but it must have taken over an hour because my
laptop went to sleep and severed the connection.  Trying it again.

Considering it's blazing fast without the reduce function, I can only assume
what's taking all this time is overhead setting up and tearing down the
simple function(keys,values){return values}.

I can give you guys the python source to set up this database so you can try
it yourself if you like.

Re: Reduce is Really Slow!

Posted by Sho Fukamachi <sh...@gmail.com>.
On 19/08/2008, at 4:27 PM, Nicholas Retallack wrote:

I can't speak to your reduce experience, other than to say that it's  
totally at odds with mine, but can help with the server issue:

> I am using couchdbx 0.8.0  I have been told that 0.8.1 improves on
> javascript views, but I was unable to get it working on my mac.  It  
> compiles
> and runs, but when I try to connect to it I am refused.  This is  
> strange,
> because Couchdbx works fine when I run it on the same port.

I believe MacOSX by default tries to resolve localhost to IPv6 (:::1)  
first and CouchDB only responds to IPv4. You can try disabling IPv6 on  
your mac, or access CouchDB at 127.0.0.1, or set the BindAdress in the  
config file to 0.0.0.0.

I know people hate hearing "works for me!" but .. works for me : ) try  
the above and see if it improves your situation.

Sho