You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2019/09/24 14:01:05 UTC

Re: [DISCUSS] Built-in reduce indexes

Hi All,

Digging up this thread again just to let you know I wrote an RFC on
implementing reduce functions on top of FoundationDB
https://github.com/apache/couchdb-documentation/pull/441

Cheers
Garren

On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org> wrote:

> Hiya Garren, thanks for starting this one. A few comments inline:
>
> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
> >
> > Hi All,
> >
> > Following on from the map discussion, I want to start the discussion on
> > built-in reduce indexes.
> >
> > ## Builtin reduces
> > Builtin reduces are definitely the easier of the two reduce options to
> > reason about and design. The one factor to keep in mind for reduces is
> that
> > we need to be able reduce at different group levels. So a data model for
> > that would like this:
> >
> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> > <group_level>, <group_key>, <_reduce_function_name>} ->
> <aggregrate_value>}
>
> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
> intending for the actual checksum to go in that element of the tuple,
> right? I wonder about using a Directory in that element of the tuple;
> adding the full signature to every key will blow up the index substantially.
>
> - I don’t think the second ?VIEWS is needed — you already scoped to ?VIEWS
> above it.
>
> - Why is the <_reduce_function_name> after the <group_key>? If someone
> defines multiple reduce functions on a single view and then wants to do a
> range query on the grouped result this will mean reading and discarding the
> output from the other reduce function. It seems to me that it would make
> more sense to put this directly after ?REDUCE.
>
> - You could consider having specific constants for each builtin reduce
> function, e.g. ?SUM, instead of the full “_sum” byte string in
> <_reduce_function_name>, which would save several bytes per key.
>
> >
> > Most of that is similar to the map data model, where it changes is from
> the
> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
> emitted
> > in the map function), then the group key used in the reduce, the reduce
> > function name e.g _sum, _count and then we store the aggregated value as
> > the FDB value.
>
> I think we need a better definition of <group_level> here. It’s not really
> the number of keys emitted from the map function, it’s the number of
> elements of an array key emitted by the map function that are used to
> determine the degree of aggregation.
>
> Also, before we even get to group_level we should describe how reduce
> functions work with keys that are not arrays. In the current API this is
> group=true (separate aggregations for each set of KV pairs that share the
> same key) or group=false (one aggregated result for the entire view).
> Internally in the current codebase we map these two settings to
> group_level=exact and group_level=0, respectively. Basically, we need a way
> to represent “exact” in the <group_level>.
>
> > ### Index management
> >
> > To update the reduce indexes, we will rely on the `id_index` and the
> > `update_seq` defined in the map discussion. Then to apply changes,  we
> > calculate the change of an aggregate value for the keys at the highest
> > group level, then apply that change to all the group levels lower than it
> > using fdb’s atomic operations [1].
>
> This is a really important point and worth calling out more explicitly.
> One reason most of the builtins are significantly simpler than the custom
> functions is because we know exactly how the aggregate values change when
> we get the map output for each updated doc in isolation. We don’t need to
> go retrieve all the rest of the map output that happens to share the same
> map key. Moreover, the change to the aggregate value can be applied using
> atomic operations so we don’t need to do anything special to avoid txn
> conflicts for views where lots of documents emit the same map key.
>
> > ### Reducer functions
> >
> > The FDB’s atomic functions support all the built in reduce functions
> > CouchDB supports. So we can use those as part of our red function. For
> the
> > `_stats` reduce function, we will have to split that across multiple key
> > values. So its data model will have an extra key in it to record what
> stat
> > it is for the _stats reducer:
> >
> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
> > <aggregrate_value>}
> >
> > We do have some problems, with `_approx_count_distinct`  because it does
> > not support removing keys from the filter. So we have three options:
> >
> > 1. We can ignore key removal entirely in the filter since this is just an
> > estimate
> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re not
> > trying to merge results from different local shards
> > 3. Don’t support it going forward
>
> There’s actually a fourth option here, which is to maintain
> _approx_count_distinct using the future machinery for custom reduce
> functions, generating a new filter from the raw map output at the lowest
> level when keys are removed. I don’t have an informed opinion yet about
> that approach.
>
> > ### Group_level=0
> >
> > One tricker situation is if a user does a group_level=0 query with a key
> > range, this would require us to do some client level aggregation. We
> would
> > have to get the aggregate values for a `group_level=1` for the supplied
> key
> > range and then aggregate those values together.
>
> This isn’t unique to group_level=0, it could happen for any
> group_level!=exact. It can also require descending more than one
> group_level depending on the precision of the supplied key range; e.g. for
> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
> aggregations with group_level=1 but supply a startkey and endkey with
> specific dates rather than months.
>
> Good start! Cheers, Adam
>
>

Re: [DISCUSS] Built-in reduce indexes

Posted by Garren Smith <ga...@apache.org>.
For anyone that is interested in following along on this. I have an initial
PR https://github.com/apache/couchdb/pull/2456
This isn't feature complete yet. I've written on the PR the current state
of the work and the type of feedback I'm looking for now.

Cheers
Garren

On Wed, Oct 23, 2019 at 3:50 PM Garren Smith <ga...@apache.org> wrote:

> Hi,
>
> I've updated this RFC
> https://github.com/apache/couchdb-documentation/pull/441 with an
> implementation of the skiplist algorithm using FDB.
>
> Cheers
> Garren
>
> On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <ga...@apache.org> wrote:
>
>> Hi All,
>>
>> Digging up this thread again just to let you know I wrote an RFC on
>> implementing reduce functions on top of FoundationDB
>> https://github.com/apache/couchdb-documentation/pull/441
>>
>> Cheers
>> Garren
>>
>> On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org>
>> wrote:
>>
>>> Hiya Garren, thanks for starting this one. A few comments inline:
>>>
>>> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > Following on from the map discussion, I want to start the discussion on
>>> > built-in reduce indexes.
>>> >
>>> > ## Builtin reduces
>>> > Builtin reduces are definitely the easier of the two reduce options to
>>> > reason about and design. The one factor to keep in mind for reduces is
>>> that
>>> > we need to be able reduce at different group levels. So a data model
>>> for
>>> > that would like this:
>>> >
>>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>>> > <group_level>, <group_key>, <_reduce_function_name>} ->
>>> <aggregrate_value>}
>>>
>>> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
>>> intending for the actual checksum to go in that element of the tuple,
>>> right? I wonder about using a Directory in that element of the tuple;
>>> adding the full signature to every key will blow up the index substantially.
>>>
>>> - I don’t think the second ?VIEWS is needed — you already scoped to
>>> ?VIEWS above it.
>>>
>>> - Why is the <_reduce_function_name> after the <group_key>? If someone
>>> defines multiple reduce functions on a single view and then wants to do a
>>> range query on the grouped result this will mean reading and discarding the
>>> output from the other reduce function. It seems to me that it would make
>>> more sense to put this directly after ?REDUCE.
>>>
>>> - You could consider having specific constants for each builtin reduce
>>> function, e.g. ?SUM, instead of the full “_sum” byte string in
>>> <_reduce_function_name>, which would save several bytes per key.
>>>
>>> >
>>> > Most of that is similar to the map data model, where it changes is
>>> from the
>>> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
>>> emitted
>>> > in the map function), then the group key used in the reduce, the reduce
>>> > function name e.g _sum, _count and then we store the aggregated value
>>> as
>>> > the FDB value.
>>>
>>> I think we need a better definition of <group_level> here. It’s not
>>> really the number of keys emitted from the map function, it’s the number of
>>> elements of an array key emitted by the map function that are used to
>>> determine the degree of aggregation.
>>>
>>> Also, before we even get to group_level we should describe how reduce
>>> functions work with keys that are not arrays. In the current API this is
>>> group=true (separate aggregations for each set of KV pairs that share the
>>> same key) or group=false (one aggregated result for the entire view).
>>> Internally in the current codebase we map these two settings to
>>> group_level=exact and group_level=0, respectively. Basically, we need a way
>>> to represent “exact” in the <group_level>.
>>>
>>> > ### Index management
>>> >
>>> > To update the reduce indexes, we will rely on the `id_index` and the
>>> > `update_seq` defined in the map discussion. Then to apply changes,  we
>>> > calculate the change of an aggregate value for the keys at the highest
>>> > group level, then apply that change to all the group levels lower than
>>> it
>>> > using fdb’s atomic operations [1].
>>>
>>> This is a really important point and worth calling out more explicitly.
>>> One reason most of the builtins are significantly simpler than the custom
>>> functions is because we know exactly how the aggregate values change when
>>> we get the map output for each updated doc in isolation. We don’t need to
>>> go retrieve all the rest of the map output that happens to share the same
>>> map key. Moreover, the change to the aggregate value can be applied using
>>> atomic operations so we don’t need to do anything special to avoid txn
>>> conflicts for views where lots of documents emit the same map key.
>>>
>>> > ### Reducer functions
>>> >
>>> > The FDB’s atomic functions support all the built in reduce functions
>>> > CouchDB supports. So we can use those as part of our red function. For
>>> the
>>> > `_stats` reduce function, we will have to split that across multiple
>>> key
>>> > values. So its data model will have an extra key in it to record what
>>> stat
>>> > it is for the _stats reducer:
>>> >
>>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>>> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
>>> > <aggregrate_value>}
>>> >
>>> > We do have some problems, with `_approx_count_distinct`  because it
>>> does
>>> > not support removing keys from the filter. So we have three options:
>>> >
>>> > 1. We can ignore key removal entirely in the filter since this is just
>>> an
>>> > estimate
>>> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re
>>> not
>>> > trying to merge results from different local shards
>>> > 3. Don’t support it going forward
>>>
>>> There’s actually a fourth option here, which is to maintain
>>> _approx_count_distinct using the future machinery for custom reduce
>>> functions, generating a new filter from the raw map output at the lowest
>>> level when keys are removed. I don’t have an informed opinion yet about
>>> that approach.
>>>
>>> > ### Group_level=0
>>> >
>>> > One tricker situation is if a user does a group_level=0 query with a
>>> key
>>> > range, this would require us to do some client level aggregation. We
>>> would
>>> > have to get the aggregate values for a `group_level=1` for the
>>> supplied key
>>> > range and then aggregate those values together.
>>>
>>> This isn’t unique to group_level=0, it could happen for any
>>> group_level!=exact. It can also require descending more than one
>>> group_level depending on the precision of the supplied key range; e.g. for
>>> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
>>> aggregations with group_level=1 but supply a startkey and endkey with
>>> specific dates rather than months.
>>>
>>> Good start! Cheers, Adam
>>>
>>>

Re: [DISCUSS] Built-in reduce indexes

Posted by Garren Smith <ga...@apache.org>.
Hi,

I've updated this RFC
https://github.com/apache/couchdb-documentation/pull/441 with an
implementation of the skiplist algorithm using FDB.

Cheers
Garren

On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <ga...@apache.org> wrote:

> Hi All,
>
> Digging up this thread again just to let you know I wrote an RFC on
> implementing reduce functions on top of FoundationDB
> https://github.com/apache/couchdb-documentation/pull/441
>
> Cheers
> Garren
>
> On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org>
> wrote:
>
>> Hiya Garren, thanks for starting this one. A few comments inline:
>>
>> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
>> >
>> > Hi All,
>> >
>> > Following on from the map discussion, I want to start the discussion on
>> > built-in reduce indexes.
>> >
>> > ## Builtin reduces
>> > Builtin reduces are definitely the easier of the two reduce options to
>> > reason about and design. The one factor to keep in mind for reduces is
>> that
>> > we need to be able reduce at different group levels. So a data model for
>> > that would like this:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>} ->
>> <aggregrate_value>}
>>
>> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
>> intending for the actual checksum to go in that element of the tuple,
>> right? I wonder about using a Directory in that element of the tuple;
>> adding the full signature to every key will blow up the index substantially.
>>
>> - I don’t think the second ?VIEWS is needed — you already scoped to
>> ?VIEWS above it.
>>
>> - Why is the <_reduce_function_name> after the <group_key>? If someone
>> defines multiple reduce functions on a single view and then wants to do a
>> range query on the grouped result this will mean reading and discarding the
>> output from the other reduce function. It seems to me that it would make
>> more sense to put this directly after ?REDUCE.
>>
>> - You could consider having specific constants for each builtin reduce
>> function, e.g. ?SUM, instead of the full “_sum” byte string in
>> <_reduce_function_name>, which would save several bytes per key.
>>
>> >
>> > Most of that is similar to the map data model, where it changes is from
>> the
>> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
>> emitted
>> > in the map function), then the group key used in the reduce, the reduce
>> > function name e.g _sum, _count and then we store the aggregated value as
>> > the FDB value.
>>
>> I think we need a better definition of <group_level> here. It’s not
>> really the number of keys emitted from the map function, it’s the number of
>> elements of an array key emitted by the map function that are used to
>> determine the degree of aggregation.
>>
>> Also, before we even get to group_level we should describe how reduce
>> functions work with keys that are not arrays. In the current API this is
>> group=true (separate aggregations for each set of KV pairs that share the
>> same key) or group=false (one aggregated result for the entire view).
>> Internally in the current codebase we map these two settings to
>> group_level=exact and group_level=0, respectively. Basically, we need a way
>> to represent “exact” in the <group_level>.
>>
>> > ### Index management
>> >
>> > To update the reduce indexes, we will rely on the `id_index` and the
>> > `update_seq` defined in the map discussion. Then to apply changes,  we
>> > calculate the change of an aggregate value for the keys at the highest
>> > group level, then apply that change to all the group levels lower than
>> it
>> > using fdb’s atomic operations [1].
>>
>> This is a really important point and worth calling out more explicitly.
>> One reason most of the builtins are significantly simpler than the custom
>> functions is because we know exactly how the aggregate values change when
>> we get the map output for each updated doc in isolation. We don’t need to
>> go retrieve all the rest of the map output that happens to share the same
>> map key. Moreover, the change to the aggregate value can be applied using
>> atomic operations so we don’t need to do anything special to avoid txn
>> conflicts for views where lots of documents emit the same map key.
>>
>> > ### Reducer functions
>> >
>> > The FDB’s atomic functions support all the built in reduce functions
>> > CouchDB supports. So we can use those as part of our red function. For
>> the
>> > `_stats` reduce function, we will have to split that across multiple key
>> > values. So its data model will have an extra key in it to record what
>> stat
>> > it is for the _stats reducer:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
>> > <aggregrate_value>}
>> >
>> > We do have some problems, with `_approx_count_distinct`  because it does
>> > not support removing keys from the filter. So we have three options:
>> >
>> > 1. We can ignore key removal entirely in the filter since this is just
>> an
>> > estimate
>> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re
>> not
>> > trying to merge results from different local shards
>> > 3. Don’t support it going forward
>>
>> There’s actually a fourth option here, which is to maintain
>> _approx_count_distinct using the future machinery for custom reduce
>> functions, generating a new filter from the raw map output at the lowest
>> level when keys are removed. I don’t have an informed opinion yet about
>> that approach.
>>
>> > ### Group_level=0
>> >
>> > One tricker situation is if a user does a group_level=0 query with a key
>> > range, this would require us to do some client level aggregation. We
>> would
>> > have to get the aggregate values for a `group_level=1` for the supplied
>> key
>> > range and then aggregate those values together.
>>
>> This isn’t unique to group_level=0, it could happen for any
>> group_level!=exact. It can also require descending more than one
>> group_level depending on the precision of the supplied key range; e.g. for
>> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
>> aggregations with group_level=1 but supply a startkey and endkey with
>> specific dates rather than months.
>>
>> Good start! Cheers, Adam
>>
>>