You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Matt Goodall <ma...@gmail.com> on 2009/02/05 15:33:26 UTC

Reduce to nothing

Hi,

Is it possible to reduce a key to nothing, i.e. completely remove a
key from the reduction result.

For instance, say you post three documents:

    {"_id": "thing1", "type": "thing"}
    {"_id": "thing2", "type": "thing"}
    {"_id": "...", "type": "cancellation", "cancels": "thing1"}

It's trivial to produce a map function that collates the "thing" and
"cancellation" documents. However, I can't work out how, or even it
it's possible, to reduce the view that so that only "thing2" remains.

This seems like it might be a useful thing to be able to do at times.
For instance, posting a "cancellation" document effectively removes a
"thing" from the database. Nothing needs to touch the "thing",
avoiding any chance of conflicts and it doesn't matter how many
"cancellation" documents get posted making cancellation an idempotent
operation.

I tried not returning anything, just in case it worked ;-), but got a
JSON encoding error (can't encode undefined, iirc). That would be a
simple approach. However, I wondered if the more "normal" approach of
allowing a reduce function to emit zero or more (key, value) pairs
would be even better? Not sure if that would break the incremental
reduce though.

Any ideas, or is there another way to achieve this that I'm missing?

- Matt

Re: Reduce to nothing

Posted by Matt Goodall <ma...@gmail.com>.
2009/2/5 Paul Davis <pa...@gmail.com>:
> Matt,
>
> About the best you're going to get with the current implementation
> would be similar to the code that Jeremy posted which in effect
> returns a 'cancelled' value for that item.

Yeah, which leaves you getting everything out and checking the value
instead of the cancelled docs just "disappearing". Not exactly really
;-).

>  I did just wake up so I
> might be missing something clever, but I'm pretty sure this is baked
> in because of the incremental calculation stuff.

:nod: I had a feeling it might break things down there but, not
knowing the code, wasn't sure if it was something someone just hadn't
tried yet. I thought removing a key completely might be feasible, but
less sure about multiple emits.

Oh well, it was a fun idea.

- Matt

>
> HTH,
> Paul Davis
>
> On Thu, Feb 5, 2009 at 9:33 AM, Matt Goodall <ma...@gmail.com> wrote:
>> Hi,
>>
>> Is it possible to reduce a key to nothing, i.e. completely remove a
>> key from the reduction result.
>>
>> For instance, say you post three documents:
>>
>>    {"_id": "thing1", "type": "thing"}
>>    {"_id": "thing2", "type": "thing"}
>>    {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>>
>> It's trivial to produce a map function that collates the "thing" and
>> "cancellation" documents. However, I can't work out how, or even it
>> it's possible, to reduce the view that so that only "thing2" remains.
>>
>> This seems like it might be a useful thing to be able to do at times.
>> For instance, posting a "cancellation" document effectively removes a
>> "thing" from the database. Nothing needs to touch the "thing",
>> avoiding any chance of conflicts and it doesn't matter how many
>> "cancellation" documents get posted making cancellation an idempotent
>> operation.
>>
>> I tried not returning anything, just in case it worked ;-), but got a
>> JSON encoding error (can't encode undefined, iirc). That would be a
>> simple approach. However, I wondered if the more "normal" approach of
>> allowing a reduce function to emit zero or more (key, value) pairs
>> would be even better? Not sure if that would break the incremental
>> reduce though.
>>
>> Any ideas, or is there another way to achieve this that I'm missing?
>>
>> - Matt
>>
>

Re: Reduce to nothing

Posted by Paul Davis <pa...@gmail.com>.
Matt,

About the best you're going to get with the current implementation
would be similar to the code that Jeremy posted which in effect
returns a 'cancelled' value for that item. I did just wake up so I
might be missing something clever, but I'm pretty sure this is baked
in because of the incremental calculation stuff.

HTH,
Paul Davis

On Thu, Feb 5, 2009 at 9:33 AM, Matt Goodall <ma...@gmail.com> wrote:
> Hi,
>
> Is it possible to reduce a key to nothing, i.e. completely remove a
> key from the reduction result.
>
> For instance, say you post three documents:
>
>    {"_id": "thing1", "type": "thing"}
>    {"_id": "thing2", "type": "thing"}
>    {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>
> It's trivial to produce a map function that collates the "thing" and
> "cancellation" documents. However, I can't work out how, or even it
> it's possible, to reduce the view that so that only "thing2" remains.
>
> This seems like it might be a useful thing to be able to do at times.
> For instance, posting a "cancellation" document effectively removes a
> "thing" from the database. Nothing needs to touch the "thing",
> avoiding any chance of conflicts and it doesn't matter how many
> "cancellation" documents get posted making cancellation an idempotent
> operation.
>
> I tried not returning anything, just in case it worked ;-), but got a
> JSON encoding error (can't encode undefined, iirc). That would be a
> simple approach. However, I wondered if the more "normal" approach of
> allowing a reduce function to emit zero or more (key, value) pairs
> would be even better? Not sure if that would break the incremental
> reduce though.
>
> Any ideas, or is there another way to achieve this that I'm missing?
>
> - Matt
>

Re: Reduce to nothing

Posted by Chris Anderson <jc...@apache.org>.
On Thu, Feb 5, 2009 at 11:55 PM, Brian Candler <B....@pobox.com> wrote:
> Nearest I can think of is to collate your view such that the cancellation
> comes immediately after the thing:
>
>  ["thing1","thing"]
>  ["thing1","cancellation"]
>
> Then the client can see that these two are adjacent and easily check if the
> item has been cancelled.
>

This is the right way to do it. If you put the cancellation just
*before* the item it cancels, then you can even have a proxy "hide"
the canceled items in constant space with no buffering.

-- 
Chris Anderson
http://jchris.mfdz.com

Re: Reduce to nothing

Posted by Brian Candler <B....@pobox.com>.
On Thu, Feb 05, 2009 at 08:44:16AM -0600, Jeremy Wall wrote:
> Is it possible to reduce a key to nothing, i.e. completely remove a
> key from the reduction result.
> 
> For instance, say you post three documents:
> 
>    {"_id": "thing1", "type": "thing"}
>    {"_id": "thing2", "type": "thing"}
>    {"_id": "...", "type": "cancellation", "cancels": "thing1"}
> 
> It's trivial to produce a map function that collates the "thing" and
> "cancellation" documents. However, I can't work out how, or even it
> it's possible, to reduce the view that so that only "thing2" remains.

Nearest I can think of is to collate your view such that the cancellation
comes immediately after the thing:

  ["thing1","thing"]
  ["thing1","cancellation"]

Then the client can see that these two are adjacent and easily check if the
item has been cancelled.

Unfortunately, you can't rely on this in the reduce function, because
sometimes the thing will be in one block of keys/values and the cancellation
will be in another.

If the purpose of your reduce view is only to _count_ how many live things
you have, then you could map:

  ["thing1",1]        # thing
  ["thing1",-1]       # cancellation

and then sum. This won't be right if you can have multiple cancellations for
a thing, but you could avoid this by choosing your doc id naming convention
for cancellations (e.g. "thing1_cancel"). In any case, if you return a
grouped reduce, and see a negative value for a particular key, you know it
has been cancelled.

> I tried not returning anything, just in case it worked ;-), but got a
> JSON encoding error (can't encode undefined, iirc).

You can encode null, though.

> However, I wondered if the more "normal" approach of
> allowing a reduce function to emit zero or more (key, value) pairs
> would be even better?

I think reduce functions have to return a single value - at least, all the
ones I've seen do this. IIUC, all the k/v pairs pointed to by a single
b-tree node are reduced to one value, which is stored within the same b-tree
node. Then the parent b-tree nodes contain the reduction of their children.
The root node contains the reduction of everything to a single value, and
this is what you get if you query without group=true. If you query with
startkey and endkey then the reduce value is recalculated across the range
of keys you specify.

So a reduce function is not a filter on map output, but an aggregation /
summarisation function.

Regards,

Brian.

Re: Reduce to nothing

Posted by Jeremy Wall <jw...@google.com>.
map
function
On Thu, Feb 5, 2009 at 8:50 AM, Matt Goodall <ma...@gmail.com> wrote:

> 2009/2/5 Jeremy Wall <jw...@google.com>:
> > Can't you just have the map screen out the cancellations? Seems like
> that's
> > the most natural spot for the behaviour you want.
>
> Nope, because the "thing" doesn't know it's been cancelled.
>
> - Matt
>
>
> >
> > On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com> wrote:
> >
> > Hi,
> >
> > Is it possible to reduce a key to nothing, i.e. completely remove a
> > key from the reduction result.
> >
> > For instance, say you post three documents:
> >
> >   {"_id": "thing1", "type": "thing"}
> >   {"_id": "thing2", "type": "thing"}
> >   {"_id": "...", "type": "cancellation", "cancels": "thing1"}
> >
> > It's trivial to produce a map function that collates the "thing" and
> > "cancellation" documents. However, I can't work out how, or even it
> > it's possible, to reduce the view that so that only "thing2" remains.
> >
> > This seems like it might be a useful thing to be able to do at times.
> > For instance, posting a "cancellation" document effectively removes a
> > "thing" from the database. Nothing needs to touch the "thing",
> > avoiding any chance of conflicts and it doesn't matter how many
> > "cancellation" documents get posted making cancellation an idempotent
> > operation.
> >
> > I tried not returning anything, just in case it worked ;-), but got a
> > JSON encoding error (can't encode undefined, iirc). That would be a
> > simple approach. However, I wondered if the more "normal" approach of
> > allowing a reduce function to emit zero or more (key, value) pairs
> > would be even better? Not sure if that would break the incremental
> > reduce though.
> >
> > Any ideas, or is there another way to achieve this that I'm missing?
> >
> > - Matt
> >
>
map:
function(doc) {
  if (doc.cancels)
    emit(doc.cancels, [canceled]);
  emit(doc._id, []);
}

reduce:
function(key, values, reduce) {
    //if values has canceled then
    return []
    //otherwise
    return values;
}

Re: Reduce to nothing

Posted by Matt Goodall <ma...@gmail.com>.
2009/2/5 Michael Marks <mi...@gmx.de>:
> How about having a third attribute "state": with possible values "active",

Sure, that's possible but it defeats the point of my original post,
i.e. not touching the original document.

Just to be clear, I know you can stick any old attribute on a "thing"
document and filter it in a view. What I'm interested in is using the
reduce function in more interesting ways that /may/ have additional
benefits in some cases.

- Matt

> Matt Goodall schrieb:
>>
>> 2009/2/5 Jeremy Wall <jw...@google.com>:
>>
>>>
>>> Can't you just have the map screen out the cancellations? Seems like
>>> that's
>>> the most natural spot for the behaviour you want.
>>>
>>
>> Nope, because the "thing" doesn't know it's been cancelled.
>>
>> - Matt
>>
>>
>>
>>>
>>> On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Is it possible to reduce a key to nothing, i.e. completely remove a
>>> key from the reduction result.
>>>
>>> For instance, say you post three documents:
>>>
>>>  {"_id": "thing1", "type": "thing"}
>>>  {"_id": "thing2", "type": "thing"}
>>>  {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>>>
>>> It's trivial to produce a map function that collates the "thing" and
>>> "cancellation" documents. However, I can't work out how, or even it
>>> it's possible, to reduce the view that so that only "thing2" remains.
>>>
>>> This seems like it might be a useful thing to be able to do at times.
>>> For instance, posting a "cancellation" document effectively removes a
>>> "thing" from the database. Nothing needs to touch the "thing",
>>> avoiding any chance of conflicts and it doesn't matter how many
>>> "cancellation" documents get posted making cancellation an idempotent
>>> operation.
>>>
>>> I tried not returning anything, just in case it worked ;-), but got a
>>> JSON encoding error (can't encode undefined, iirc). That would be a
>>> simple approach. However, I wondered if the more "normal" approach of
>>> allowing a reduce function to emit zero or more (key, value) pairs
>>> would be even better? Not sure if that would break the incremental
>>> reduce though.
>>>
>>> Any ideas, or is there another way to achieve this that I'm missing?
>>>
>>> - Matt
>>>
>>>
>>
>>
>

Re: [user] Re: Reduce to nothing

Posted by Wout Mertens <wm...@cisco.com>.
I think the whole point of map/reduce is that you map/reduce tells you  
"ok anything you can do with this model is ok and fast. Anything you  
can't do with this model you'll have to work around in your  
application because it can't be implemented in a scalable way".

No?

Wout.

On Feb 5, 2009, at 4:00 PM, Michael Marks wrote:

> How about having a third attribute "state": with possible values  
> "active", "cancelled" and filtering about this one?
>
> ________________________________________
> Michael Marks
>
>
>
>
> Matt Goodall schrieb:
>> 2009/2/5 Jeremy Wall <jw...@google.com>:
>>
>>> Can't you just have the map screen out the cancellations? Seems  
>>> like that's
>>> the most natural spot for the behaviour you want.
>>>
>>
>> Nope, because the "thing" doesn't know it's been cancelled.
>>
>> - Matt
>>
>>
>>
>>> On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com>  
>>> wrote:
>>>
>>> Hi,
>>>
>>> Is it possible to reduce a key to nothing, i.e. completely remove a
>>> key from the reduction result.
>>>
>>> For instance, say you post three documents:
>>>
>>>  {"_id": "thing1", "type": "thing"}
>>>  {"_id": "thing2", "type": "thing"}
>>>  {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>>>
>>> It's trivial to produce a map function that collates the "thing" and
>>> "cancellation" documents. However, I can't work out how, or even it
>>> it's possible, to reduce the view that so that only "thing2"  
>>> remains.
>>>
>>> This seems like it might be a useful thing to be able to do at  
>>> times.
>>> For instance, posting a "cancellation" document effectively  
>>> removes a
>>> "thing" from the database. Nothing needs to touch the "thing",
>>> avoiding any chance of conflicts and it doesn't matter how many
>>> "cancellation" documents get posted making cancellation an  
>>> idempotent
>>> operation.
>>>
>>> I tried not returning anything, just in case it worked ;-), but  
>>> got a
>>> JSON encoding error (can't encode undefined, iirc). That would be a
>>> simple approach. However, I wondered if the more "normal" approach  
>>> of
>>> allowing a reduce function to emit zero or more (key, value) pairs
>>> would be even better? Not sure if that would break the incremental
>>> reduce though.
>>>
>>> Any ideas, or is there another way to achieve this that I'm missing?
>>>
>>> - Matt
>>>
>>>
>>
>>


Re: Reduce to nothing

Posted by Michael Marks <mi...@gmx.de>.
How about having a third attribute "state": with possible values 
"active", "cancelled" and filtering about this one?

________________________________________
Michael Marks




Matt Goodall schrieb:
> 2009/2/5 Jeremy Wall <jw...@google.com>:
>   
>> Can't you just have the map screen out the cancellations? Seems like that's
>> the most natural spot for the behaviour you want.
>>     
>
> Nope, because the "thing" doesn't know it's been cancelled.
>
> - Matt
>
>
>   
>> On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com> wrote:
>>
>> Hi,
>>
>> Is it possible to reduce a key to nothing, i.e. completely remove a
>> key from the reduction result.
>>
>> For instance, say you post three documents:
>>
>>   {"_id": "thing1", "type": "thing"}
>>   {"_id": "thing2", "type": "thing"}
>>   {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>>
>> It's trivial to produce a map function that collates the "thing" and
>> "cancellation" documents. However, I can't work out how, or even it
>> it's possible, to reduce the view that so that only "thing2" remains.
>>
>> This seems like it might be a useful thing to be able to do at times.
>> For instance, posting a "cancellation" document effectively removes a
>> "thing" from the database. Nothing needs to touch the "thing",
>> avoiding any chance of conflicts and it doesn't matter how many
>> "cancellation" documents get posted making cancellation an idempotent
>> operation.
>>
>> I tried not returning anything, just in case it worked ;-), but got a
>> JSON encoding error (can't encode undefined, iirc). That would be a
>> simple approach. However, I wondered if the more "normal" approach of
>> allowing a reduce function to emit zero or more (key, value) pairs
>> would be even better? Not sure if that would break the incremental
>> reduce though.
>>
>> Any ideas, or is there another way to achieve this that I'm missing?
>>
>> - Matt
>>
>>     
>
>   

Re: Reduce to nothing

Posted by Matt Goodall <ma...@gmail.com>.
2009/2/5 Jeremy Wall <jw...@google.com>:
> Can't you just have the map screen out the cancellations? Seems like that's
> the most natural spot for the behaviour you want.

Nope, because the "thing" doesn't know it's been cancelled.

- Matt


>
> On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com> wrote:
>
> Hi,
>
> Is it possible to reduce a key to nothing, i.e. completely remove a
> key from the reduction result.
>
> For instance, say you post three documents:
>
>   {"_id": "thing1", "type": "thing"}
>   {"_id": "thing2", "type": "thing"}
>   {"_id": "...", "type": "cancellation", "cancels": "thing1"}
>
> It's trivial to produce a map function that collates the "thing" and
> "cancellation" documents. However, I can't work out how, or even it
> it's possible, to reduce the view that so that only "thing2" remains.
>
> This seems like it might be a useful thing to be able to do at times.
> For instance, posting a "cancellation" document effectively removes a
> "thing" from the database. Nothing needs to touch the "thing",
> avoiding any chance of conflicts and it doesn't matter how many
> "cancellation" documents get posted making cancellation an idempotent
> operation.
>
> I tried not returning anything, just in case it worked ;-), but got a
> JSON encoding error (can't encode undefined, iirc). That would be a
> simple approach. However, I wondered if the more "normal" approach of
> allowing a reduce function to emit zero or more (key, value) pairs
> would be even better? Not sure if that would break the incremental
> reduce though.
>
> Any ideas, or is there another way to achieve this that I'm missing?
>
> - Matt
>

Re: Reduce to nothing

Posted by Jeremy Wall <jw...@google.com>.
Can't you just have the map screen out the cancellations? Seems like that's
the most natural spot for the behaviour you want.

Sent from my G1 google phone

On Feb 5, 2009 8:34 AM, "Matt Goodall" <ma...@gmail.com> wrote:

Hi,

Is it possible to reduce a key to nothing, i.e. completely remove a
key from the reduction result.

For instance, say you post three documents:

   {"_id": "thing1", "type": "thing"}
   {"_id": "thing2", "type": "thing"}
   {"_id": "...", "type": "cancellation", "cancels": "thing1"}

It's trivial to produce a map function that collates the "thing" and
"cancellation" documents. However, I can't work out how, or even it
it's possible, to reduce the view that so that only "thing2" remains.

This seems like it might be a useful thing to be able to do at times.
For instance, posting a "cancellation" document effectively removes a
"thing" from the database. Nothing needs to touch the "thing",
avoiding any chance of conflicts and it doesn't matter how many
"cancellation" documents get posted making cancellation an idempotent
operation.

I tried not returning anything, just in case it worked ;-), but got a
JSON encoding error (can't encode undefined, iirc). That would be a
simple approach. However, I wondered if the more "normal" approach of
allowing a reduce function to emit zero or more (key, value) pairs
would be even better? Not sure if that would break the incremental
reduce though.

Any ideas, or is there another way to achieve this that I'm missing?

- Matt