You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "Thaina Yu (JIRA)" <ji...@apache.org> on 2017/04/11 03:54:41 UTC

[jira] [Created] (COUCHDB-3372) Exclude join from reduce

Thaina Yu created COUCHDB-3372:
----------------------------------

             Summary: Exclude join from reduce
                 Key: COUCHDB-3372
                 URL: https://issues.apache.org/jira/browse/COUCHDB-3372
             Project: CouchDB
          Issue Type: Wish
          Components: View Server Support
            Reporter: Thaina Yu


I wish that, in reduce function. If we return `undefined` for a value. That key should be removed from the reduce result. Hence, being exclude join

One scenario is my case

Suppose I have 1000 user documents and 100 room documents

In user document contains user id and name
{code:javascript}
{ _id : "user1" , name : "J"}
{ _id : "user2" , name : "K"}
{ _id : "user3" , name : "L"}
{ _id : "user4" , name : "M"}
...
{code}

In room document contains array of userID
{code:javascript}
{ _id : "room1" , users : ["user1","user2"] }
{ _id : "room2" , users : ["user3"] }
...
{code}
My system would manage "room" in particular. So user's document would not need to update when they change room

But then I need to find user who don't belong to any room. And that could not be done with mapreduce on room alone

In SQL system I would use exclude join to find any user in user table that not contain in any room of room table

But in mapreduce system like couch. I cannot find a way to do it

One solution I could think about is, when reduce function return `undefined` as the result value. We should exclude that key from the reduce result

In my case I would emit documents above with this map

{code:javascript}
if(doc.type == "user")
    emit(doc._id,doc.name);
if(doc.type == "room")
    doc.users.forEach(function(userID) { emit(userID,doc._id) });
{code}

to be rows like this

{code:javascript}
key : "user1" , value : "J"
key : "user1" , value : "room1"
key : "user2" , value : "K"
key : "user2" , value : "room1"
key : "user3" , value : "L"
key : "user3" , value : "room2"
key : "user4" , value : "M"
{code}

And reduce it with
{code:javascript}
if(!rereduce)
    return keys.length;

var count = sum(values);
if(count < 2)
    return count;

return undefined; // Here is the key. If user has no room then the id will have only 1 row
{code}

With this function. The current result for {code}group_level=1{code} nowaday would be

{code:javascript}
key : "user1" , value : undefined
key : "user2" , value : undefined
key : "user3" , value : undefined
key : "user4" , value : 1
{code}

The expected result I would like is

{code:javascript}
// exclude user1 user2 user3 from the reduce to save everything
key : "user4" , value : 1
{code}

This would save me from loading 1000 (or maybe 100000 in the real system) unnecessary items and only see just the items I want to know about



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)