You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Jarrod Roberson <ja...@vertigrated.com> on 2010/04/18 07:37:12 UTC

Is there anyway to specify a group_level of "id"?

I have a reduce function that outputs keys with duplicate ids with
group_level=1, I would really like to say something like
group_level=id and have it do the grouping on id rather than my key?
Is this possible?

-- 
Jarrod Roberson

Re: Is there anyway to specify a group_level of "id"?

Posted by J Chris Anderson <jc...@gmail.com>.

On Apr 19, 2010, at 9:27 AM, Jarrod Roberson wrote:

> On Mon, Apr 19, 2010 at 10:10 AM, Adam Kocoloski <ko...@apache.org> wrote:
>> On Apr 18, 2010, at 1:37 AM, Jarrod Roberson wrote:
>> 
>> 
>> Hi Jarrod, I'd need a little more detail or an example before I could whether what you want to do is possible.  Best,
>> 
>> Adam
> 
> I am working on what I think is a clever solution to not being able to
> do variable "select where" sql like selections on CouchDB.
> 
> here is my map function
> 
> function(doc)
> {
>  emit(['cnnid', doc.cnnid], null);
>  emit(['guid', doc.guid], null);
>  emit(['src', doc.sourceServer], null);
>  emit(['dest', doc.destServer], null);
> }
> 
> running a reduce that works with group_level=1 it is merging lists of
> _ids by field. what the result is a unique list of _ids that match
> each field name when run with keys=[['cnnid',"11111111"],["src","a"]].
> I get output that groups by cnnid and src what I want to do is
> rereduce just the "final" output one more time to reduce the keys down
> to the unique list of _ids from the resulting groups.
> 

I think this is misuse of CouchDB's reduce, which is really just there to provide numerical aggregation (or other constant-space operations). I'm surprised you haven't been hit with the reduce_overflow_error yet. As your database becomes larger, this will surely happen.

If you want to do something like this, you are better off moving all your uniqueness logic to a _list function, and then optimizing your map collation to keep the memory usage of your _list low.

Chris

> 
> curl -X POST -d '{"keys":[["cnnid","82534864"],["src","a"]]}'
> http://localhost:5984/transfer_central/_design/transfer/_view/search?group=true&group_level=1
> looks like this
> 
> {"rows":[
> {"key":["cnnid","82534864"],"value":["fdbc746e0026B93BD6FE6f83c80de090","fdbc6f930026B92F3075118c8e46f574","fdbc59760026B92F30754e88a5fb1d0a"]},
> {"key":["src","a"],"value":["2fe5b7620026B93BD6FE54240135cf78","3028f1010026B93BD6FE27430d5ff179","3028f3df0026B93BD6FE1aaf792acaec","48a5d7ab0026B93BD6FE347e40beba22","48a5dada0026B93BD6FE5759f3f61946","48a8630c0026B93BD6FE6bc36f72aaf9","673fd21a0026B93BD6FE56b473da6a77","673fd4790026B93BD6FE47aeffcaa16b","67af7dbc0026B93BD6FE3134aabda4b6","67af80b60026B93BD6FE132454bf17b5"]}
> ]}
> 
> I tried group_level=0 but that does the same thing as 1 in my case. I
> tried hacking at the Erlang source to get it to run a another final
> rereduce with a "special" group_level=9999 but I didn't have any luck
> getting that to work. What I finally resorted to is a List function
> that does the final reduce/merge of the _ids (same thing the reduce
> function is doing really) but I really think it would be clever if I
> could get the reduce to run one last rereduce instead of the List
> function solution.
> 
> Here is a post about it in more detail
> http://www.vertigrated.com/blog/2010/04/where-clauses-like-selects-against-couchdb/

Re: Is there anyway to specify a group_level of "id"?

Posted by Jarrod Roberson <ja...@vertigrated.com>.

On Mon, Apr 19, 2010 at 10:10 AM, Adam Kocoloski <ko...@apache.org> wrote:
> On Apr 18, 2010, at 1:37 AM, Jarrod Roberson wrote:
>
>
> Hi Jarrod, I'd need a little more detail or an example before I could whether what you want to do is possible.  Best,
>
> Adam

I am working on what I think is a clever solution to not being able to
do variable "select where" sql like selections on CouchDB.

here is my map function

function(doc)
{
  emit(['cnnid', doc.cnnid], null);
  emit(['guid', doc.guid], null);
  emit(['src', doc.sourceServer], null);
  emit(['dest', doc.destServer], null);
}

running a reduce that works with group_level=1 it is merging lists of
_ids by field. what the result is a unique list of _ids that match
each field name when run with keys=[['cnnid',"11111111"],["src","a"]].
I get output that groups by cnnid and src what I want to do is
rereduce just the "final" output one more time to reduce the keys down
to the unique list of _ids from the resulting groups.

curl -X POST -d '{"keys":[["cnnid","82534864"],["src","a"]]}'
http://localhost:5984/transfer_central/_design/transfer/_view/search?group=true&group_level=1
looks like this

{"rows":[
{"key":["cnnid","82534864"],"value":["fdbc746e0026B93BD6FE6f83c80de090","fdbc6f930026B92F3075118c8e46f574","fdbc59760026B92F30754e88a5fb1d0a"]},
{"key":["src","a"],"value":["2fe5b7620026B93BD6FE54240135cf78","3028f1010026B93BD6FE27430d5ff179","3028f3df0026B93BD6FE1aaf792acaec","48a5d7ab0026B93BD6FE347e40beba22","48a5dada0026B93BD6FE5759f3f61946","48a8630c0026B93BD6FE6bc36f72aaf9","673fd21a0026B93BD6FE56b473da6a77","673fd4790026B93BD6FE47aeffcaa16b","67af7dbc0026B93BD6FE3134aabda4b6","67af80b60026B93BD6FE132454bf17b5"]}
]}

I tried group_level=0 but that does the same thing as 1 in my case. I
tried hacking at the Erlang source to get it to run a another final
rereduce with a "special" group_level=9999 but I didn't have any luck
getting that to work. What I finally resorted to is a List function
that does the final reduce/merge of the _ids (same thing the reduce
function is doing really) but I really think it would be clever if I
could get the reduce to run one last rereduce instead of the List
function solution.

Here is a post about it in more detail
http://www.vertigrated.com/blog/2010/04/where-clauses-like-selects-against-couchdb/

Re: Is there anyway to specify a group_level of "id"?

Posted by Adam Kocoloski <ko...@apache.org>.

On Apr 18, 2010, at 1:37 AM, Jarrod Roberson wrote:

> I have a reduce function that outputs keys with duplicate ids with
> group_level=1, I would really like to say something like
> group_level=id and have it do the grouping on id rather than my key?
> Is this possible?
> 
> -- 
> Jarrod Roberson

Hi Jarrod, I'd need a little more detail or an example before I could whether what you want to do is possible.  Best,

Adam