You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Garren Smith <ga...@apache.org> on 2019/04/23 15:38:35 UTC

[DISCUSS] Built-in reduce indexes

Hi All,

Following on from the map discussion, I want to start the discussion on
built-in reduce indexes.

## Builtin reduces
Builtin reduces are definitely the easier of the two reduce options to
reason about and design. The one factor to keep in mind for reduces is that
we need to be able reduce at different group levels. So a data model for
that would like this:

{?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
<group_level>, <group_key>, <_reduce_function_name>} -> <aggregrate_value>}

Most of that is similar to the map data model, where it changes is from the
?REDUCE subspace, we add the group_level (from 1 -> number of keys emitted
in the map function), then the group key used in the reduce, the reduce
function name e.g _sum, _count and then we store the aggregated value as
the FDB value.

### Index management

To update the reduce indexes, we will rely on the `id_index` and the
`update_seq` defined in the map discussion. Then to apply changes,  we
calculate the change of an aggregate value for the keys at the highest
group level, then apply that change to all the group levels lower than it
using fdb’s atomic operations [1].

### Reducer functions

The FDB’s atomic functions support all the built in reduce functions
CouchDB supports. So we can use those as part of our red function. For the
`_stats` reduce function, we will have to split that across multiple key
values. So its data model will have an extra key in it to record what stat
it is for the _stats reducer:

{?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
<group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
<aggregrate_value>}

We do have some problems, with `_approx_count_distinct`  because it does
not support removing keys from the filter. So we have three options:

1. We can ignore key removal entirely in the filter since this is just an
estimate
2.  Implement a real COUNT DISTINCT function, we can do because we’re not
trying to merge results from different local shards
3. Don’t support it going forward

### Group_level=0

One tricker situation is if a user does a group_level=0 query with a key
range, this would require us to do some client level aggregation. We would
have to get the aggregate values for a `group_level=1` for the supplied key
range and then aggregate those values together.

I would love to hear your thoughts, ideas on this?

If you are wondering about custom reduce indexes, I’m still working on that
and will start a discussion email on that a little later.

Cheers
Garren

Re: [DISCUSS] Built-in reduce indexes

Posted by Garren Smith <ga...@apache.org>.

For anyone that is interested in following along on this. I have an initial
PR https://github.com/apache/couchdb/pull/2456
This isn't feature complete yet. I've written on the PR the current state
of the work and the type of feedback I'm looking for now.

Cheers
Garren

On Wed, Oct 23, 2019 at 3:50 PM Garren Smith <ga...@apache.org> wrote:

> Hi,
>
> I've updated this RFC
> https://github.com/apache/couchdb-documentation/pull/441 with an
> implementation of the skiplist algorithm using FDB.
>
> Cheers
> Garren
>
> On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <ga...@apache.org> wrote:
>
>> Hi All,
>>
>> Digging up this thread again just to let you know I wrote an RFC on
>> implementing reduce functions on top of FoundationDB
>> https://github.com/apache/couchdb-documentation/pull/441
>>
>> Cheers
>> Garren
>>
>> On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org>
>> wrote:
>>
>>> Hiya Garren, thanks for starting this one. A few comments inline:
>>>
>>> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > Following on from the map discussion, I want to start the discussion on
>>> > built-in reduce indexes.
>>> >
>>> > ## Builtin reduces
>>> > Builtin reduces are definitely the easier of the two reduce options to
>>> > reason about and design. The one factor to keep in mind for reduces is
>>> that
>>> > we need to be able reduce at different group levels. So a data model
>>> for
>>> > that would like this:
>>> >
>>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>>> > <group_level>, <group_key>, <_reduce_function_name>} ->
>>> <aggregrate_value>}
>>>
>>> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
>>> intending for the actual checksum to go in that element of the tuple,
>>> right? I wonder about using a Directory in that element of the tuple;
>>> adding the full signature to every key will blow up the index substantially.
>>>
>>> - I don’t think the second ?VIEWS is needed — you already scoped to
>>> ?VIEWS above it.
>>>
>>> - Why is the <_reduce_function_name> after the <group_key>? If someone
>>> defines multiple reduce functions on a single view and then wants to do a
>>> range query on the grouped result this will mean reading and discarding the
>>> output from the other reduce function. It seems to me that it would make
>>> more sense to put this directly after ?REDUCE.
>>>
>>> - You could consider having specific constants for each builtin reduce
>>> function, e.g. ?SUM, instead of the full “_sum” byte string in
>>> <_reduce_function_name>, which would save several bytes per key.
>>>
>>> >
>>> > Most of that is similar to the map data model, where it changes is
>>> from the
>>> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
>>> emitted
>>> > in the map function), then the group key used in the reduce, the reduce
>>> > function name e.g _sum, _count and then we store the aggregated value
>>> as
>>> > the FDB value.
>>>
>>> I think we need a better definition of <group_level> here. It’s not
>>> really the number of keys emitted from the map function, it’s the number of
>>> elements of an array key emitted by the map function that are used to
>>> determine the degree of aggregation.
>>>
>>> Also, before we even get to group_level we should describe how reduce
>>> functions work with keys that are not arrays. In the current API this is
>>> group=true (separate aggregations for each set of KV pairs that share the
>>> same key) or group=false (one aggregated result for the entire view).
>>> Internally in the current codebase we map these two settings to
>>> group_level=exact and group_level=0, respectively. Basically, we need a way
>>> to represent “exact” in the <group_level>.
>>>
>>> > ### Index management
>>> >
>>> > To update the reduce indexes, we will rely on the `id_index` and the
>>> > `update_seq` defined in the map discussion. Then to apply changes,  we
>>> > calculate the change of an aggregate value for the keys at the highest
>>> > group level, then apply that change to all the group levels lower than
>>> it
>>> > using fdb’s atomic operations [1].
>>>
>>> This is a really important point and worth calling out more explicitly.
>>> One reason most of the builtins are significantly simpler than the custom
>>> functions is because we know exactly how the aggregate values change when
>>> we get the map output for each updated doc in isolation. We don’t need to
>>> go retrieve all the rest of the map output that happens to share the same
>>> map key. Moreover, the change to the aggregate value can be applied using
>>> atomic operations so we don’t need to do anything special to avoid txn
>>> conflicts for views where lots of documents emit the same map key.
>>>
>>> > ### Reducer functions
>>> >
>>> > The FDB’s atomic functions support all the built in reduce functions
>>> > CouchDB supports. So we can use those as part of our red function. For
>>> the
>>> > `_stats` reduce function, we will have to split that across multiple
>>> key
>>> > values. So its data model will have an extra key in it to record what
>>> stat
>>> > it is for the _stats reducer:
>>> >
>>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>>> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
>>> > <aggregrate_value>}
>>> >
>>> > We do have some problems, with `_approx_count_distinct`  because it
>>> does
>>> > not support removing keys from the filter. So we have three options:
>>> >
>>> > 1. We can ignore key removal entirely in the filter since this is just
>>> an
>>> > estimate
>>> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re
>>> not
>>> > trying to merge results from different local shards
>>> > 3. Don’t support it going forward
>>>
>>> There’s actually a fourth option here, which is to maintain
>>> _approx_count_distinct using the future machinery for custom reduce
>>> functions, generating a new filter from the raw map output at the lowest
>>> level when keys are removed. I don’t have an informed opinion yet about
>>> that approach.
>>>
>>> > ### Group_level=0
>>> >
>>> > One tricker situation is if a user does a group_level=0 query with a
>>> key
>>> > range, this would require us to do some client level aggregation. We
>>> would
>>> > have to get the aggregate values for a `group_level=1` for the
>>> supplied key
>>> > range and then aggregate those values together.
>>>
>>> This isn’t unique to group_level=0, it could happen for any
>>> group_level!=exact. It can also require descending more than one
>>> group_level depending on the precision of the supplied key range; e.g. for
>>> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
>>> aggregations with group_level=1 but supply a startkey and endkey with
>>> specific dates rather than months.
>>>
>>> Good start! Cheers, Adam
>>>
>>>

Re: [DISCUSS] Built-in reduce indexes

Posted by Garren Smith <ga...@apache.org>.

Hi,

I've updated this RFC
https://github.com/apache/couchdb-documentation/pull/441 with an
implementation of the skiplist algorithm using FDB.

Cheers
Garren

On Tue, Sep 24, 2019 at 4:01 PM Garren Smith <ga...@apache.org> wrote:

> Hi All,
>
> Digging up this thread again just to let you know I wrote an RFC on
> implementing reduce functions on top of FoundationDB
> https://github.com/apache/couchdb-documentation/pull/441
>
> Cheers
> Garren
>
> On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org>
> wrote:
>
>> Hiya Garren, thanks for starting this one. A few comments inline:
>>
>> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
>> >
>> > Hi All,
>> >
>> > Following on from the map discussion, I want to start the discussion on
>> > built-in reduce indexes.
>> >
>> > ## Builtin reduces
>> > Builtin reduces are definitely the easier of the two reduce options to
>> > reason about and design. The one factor to keep in mind for reduces is
>> that
>> > we need to be able reduce at different group levels. So a data model for
>> > that would like this:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>} ->
>> <aggregrate_value>}
>>
>> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
>> intending for the actual checksum to go in that element of the tuple,
>> right? I wonder about using a Directory in that element of the tuple;
>> adding the full signature to every key will blow up the index substantially.
>>
>> - I don’t think the second ?VIEWS is needed — you already scoped to
>> ?VIEWS above it.
>>
>> - Why is the <_reduce_function_name> after the <group_key>? If someone
>> defines multiple reduce functions on a single view and then wants to do a
>> range query on the grouped result this will mean reading and discarding the
>> output from the other reduce function. It seems to me that it would make
>> more sense to put this directly after ?REDUCE.
>>
>> - You could consider having specific constants for each builtin reduce
>> function, e.g. ?SUM, instead of the full “_sum” byte string in
>> <_reduce_function_name>, which would save several bytes per key.
>>
>> >
>> > Most of that is similar to the map data model, where it changes is from
>> the
>> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
>> emitted
>> > in the map function), then the group key used in the reduce, the reduce
>> > function name e.g _sum, _count and then we store the aggregated value as
>> > the FDB value.
>>
>> I think we need a better definition of <group_level> here. It’s not
>> really the number of keys emitted from the map function, it’s the number of
>> elements of an array key emitted by the map function that are used to
>> determine the degree of aggregation.
>>
>> Also, before we even get to group_level we should describe how reduce
>> functions work with keys that are not arrays. In the current API this is
>> group=true (separate aggregations for each set of KV pairs that share the
>> same key) or group=false (one aggregated result for the entire view).
>> Internally in the current codebase we map these two settings to
>> group_level=exact and group_level=0, respectively. Basically, we need a way
>> to represent “exact” in the <group_level>.
>>
>> > ### Index management
>> >
>> > To update the reduce indexes, we will rely on the `id_index` and the
>> > `update_seq` defined in the map discussion. Then to apply changes,  we
>> > calculate the change of an aggregate value for the keys at the highest
>> > group level, then apply that change to all the group levels lower than
>> it
>> > using fdb’s atomic operations [1].
>>
>> This is a really important point and worth calling out more explicitly.
>> One reason most of the builtins are significantly simpler than the custom
>> functions is because we know exactly how the aggregate values change when
>> we get the map output for each updated doc in isolation. We don’t need to
>> go retrieve all the rest of the map output that happens to share the same
>> map key. Moreover, the change to the aggregate value can be applied using
>> atomic operations so we don’t need to do anything special to avoid txn
>> conflicts for views where lots of documents emit the same map key.
>>
>> > ### Reducer functions
>> >
>> > The FDB’s atomic functions support all the built in reduce functions
>> > CouchDB supports. So we can use those as part of our red function. For
>> the
>> > `_stats` reduce function, we will have to split that across multiple key
>> > values. So its data model will have an extra key in it to record what
>> stat
>> > it is for the _stats reducer:
>> >
>> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
>> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
>> > <aggregrate_value>}
>> >
>> > We do have some problems, with `_approx_count_distinct`  because it does
>> > not support removing keys from the filter. So we have three options:
>> >
>> > 1. We can ignore key removal entirely in the filter since this is just
>> an
>> > estimate
>> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re
>> not
>> > trying to merge results from different local shards
>> > 3. Don’t support it going forward
>>
>> There’s actually a fourth option here, which is to maintain
>> _approx_count_distinct using the future machinery for custom reduce
>> functions, generating a new filter from the raw map output at the lowest
>> level when keys are removed. I don’t have an informed opinion yet about
>> that approach.
>>
>> > ### Group_level=0
>> >
>> > One tricker situation is if a user does a group_level=0 query with a key
>> > range, this would require us to do some client level aggregation. We
>> would
>> > have to get the aggregate values for a `group_level=1` for the supplied
>> key
>> > range and then aggregate those values together.
>>
>> This isn’t unique to group_level=0, it could happen for any
>> group_level!=exact. It can also require descending more than one
>> group_level depending on the precision of the supplied key range; e.g. for
>> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
>> aggregations with group_level=1 but supply a startkey and endkey with
>> specific dates rather than months.
>>
>> Good start! Cheers, Adam
>>
>>

Re: [DISCUSS] Built-in reduce indexes

Posted by Garren Smith <ga...@apache.org>.

Hi All,

Digging up this thread again just to let you know I wrote an RFC on
implementing reduce functions on top of FoundationDB
https://github.com/apache/couchdb-documentation/pull/441

Cheers
Garren

On Thu, Apr 25, 2019 at 6:05 PM Adam Kocoloski <ko...@apache.org> wrote:

> Hiya Garren, thanks for starting this one. A few comments inline:
>
> > On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
> >
> > Hi All,
> >
> > Following on from the map discussion, I want to start the discussion on
> > built-in reduce indexes.
> >
> > ## Builtin reduces
> > Builtin reduces are definitely the easier of the two reduce options to
> > reason about and design. The one factor to keep in mind for reduces is
> that
> > we need to be able reduce at different group levels. So a data model for
> > that would like this:
> >
> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> > <group_level>, <group_key>, <_reduce_function_name>} ->
> <aggregrate_value>}
>
> - I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re
> intending for the actual checksum to go in that element of the tuple,
> right? I wonder about using a Directory in that element of the tuple;
> adding the full signature to every key will blow up the index substantially.
>
> - I don’t think the second ?VIEWS is needed — you already scoped to ?VIEWS
> above it.
>
> - Why is the <_reduce_function_name> after the <group_key>? If someone
> defines multiple reduce functions on a single view and then wants to do a
> range query on the grouped result this will mean reading and discarding the
> output from the other reduce function. It seems to me that it would make
> more sense to put this directly after ?REDUCE.
>
> - You could consider having specific constants for each builtin reduce
> function, e.g. ?SUM, instead of the full “_sum” byte string in
> <_reduce_function_name>, which would save several bytes per key.
>
> >
> > Most of that is similar to the map data model, where it changes is from
> the
> > ?REDUCE subspace, we add the group_level (from 1 -> number of keys
> emitted
> > in the map function), then the group key used in the reduce, the reduce
> > function name e.g _sum, _count and then we store the aggregated value as
> > the FDB value.
>
> I think we need a better definition of <group_level> here. It’s not really
> the number of keys emitted from the map function, it’s the number of
> elements of an array key emitted by the map function that are used to
> determine the degree of aggregation.
>
> Also, before we even get to group_level we should describe how reduce
> functions work with keys that are not arrays. In the current API this is
> group=true (separate aggregations for each set of KV pairs that share the
> same key) or group=false (one aggregated result for the entire view).
> Internally in the current codebase we map these two settings to
> group_level=exact and group_level=0, respectively. Basically, we need a way
> to represent “exact” in the <group_level>.
>
> > ### Index management
> >
> > To update the reduce indexes, we will rely on the `id_index` and the
> > `update_seq` defined in the map discussion. Then to apply changes,  we
> > calculate the change of an aggregate value for the keys at the highest
> > group level, then apply that change to all the group levels lower than it
> > using fdb’s atomic operations [1].
>
> This is a really important point and worth calling out more explicitly.
> One reason most of the builtins are significantly simpler than the custom
> functions is because we know exactly how the aggregate values change when
> we get the map output for each updated doc in isolation. We don’t need to
> go retrieve all the rest of the map output that happens to share the same
> map key. Moreover, the change to the aggregate value can be applied using
> atomic operations so we don’t need to do anything special to avoid txn
> conflicts for views where lots of documents emit the same map key.
>
> > ### Reducer functions
> >
> > The FDB’s atomic functions support all the built in reduce functions
> > CouchDB supports. So we can use those as part of our red function. For
> the
> > `_stats` reduce function, we will have to split that across multiple key
> > values. So its data model will have an extra key in it to record what
> stat
> > it is for the _stats reducer:
> >
> > {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> > <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
> > <aggregrate_value>}
> >
> > We do have some problems, with `_approx_count_distinct`  because it does
> > not support removing keys from the filter. So we have three options:
> >
> > 1. We can ignore key removal entirely in the filter since this is just an
> > estimate
> > 2.  Implement a real COUNT DISTINCT function, we can do because we’re not
> > trying to merge results from different local shards
> > 3. Don’t support it going forward
>
> There’s actually a fourth option here, which is to maintain
> _approx_count_distinct using the future machinery for custom reduce
> functions, generating a new filter from the raw map output at the lowest
> level when keys are removed. I don’t have an informed opinion yet about
> that approach.
>
> > ### Group_level=0
> >
> > One tricker situation is if a user does a group_level=0 query with a key
> > range, this would require us to do some client level aggregation. We
> would
> > have to get the aggregate values for a `group_level=1` for the supplied
> key
> > range and then aggregate those values together.
>
> This isn’t unique to group_level=0, it could happen for any
> group_level!=exact. It can also require descending more than one
> group_level depending on the precision of the supplied key range; e.g. for
> a date-based key like [YYYY, MM, DD]; someone could ask for yearly
> aggregations with group_level=1 but supply a startkey and endkey with
> specific dates rather than months.
>
> Good start! Cheers, Adam
>
>

Re: [DISCUSS] Built-in reduce indexes

Posted by Adam Kocoloski <ko...@apache.org>.

Hiya Garren, thanks for starting this one. A few comments inline:

> On Apr 23, 2019, at 11:38 AM, Garren Smith <ga...@apache.org> wrote:
> 
> Hi All,
> 
> Following on from the map discussion, I want to start the discussion on
> built-in reduce indexes.
> 
> ## Builtin reduces
> Builtin reduces are definitely the easier of the two reduce options to
> reason about and design. The one factor to keep in mind for reduces is that
> we need to be able reduce at different group levels. So a data model for
> that would like this:
> 
> {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> <group_level>, <group_key>, <_reduce_function_name>} -> <aggregrate_value>}

- I think this should be <view_signature>, not ?VIEW_SIGNATURE — you’re intending for the actual checksum to go in that element of the tuple, right? I wonder about using a Directory in that element of the tuple; adding the full signature to every key will blow up the index substantially.

- I don’t think the second ?VIEWS is needed — you already scoped to ?VIEWS above it.

- Why is the <_reduce_function_name> after the <group_key>? If someone defines multiple reduce functions on a single view and then wants to do a range query on the grouped result this will mean reading and discarding the output from the other reduce function. It seems to me that it would make more sense to put this directly after ?REDUCE.

- You could consider having specific constants for each builtin reduce function, e.g. ?SUM, instead of the full “_sum” byte string in <_reduce_function_name>, which would save several bytes per key.

> 
> Most of that is similar to the map data model, where it changes is from the
> ?REDUCE subspace, we add the group_level (from 1 -> number of keys emitted
> in the map function), then the group key used in the reduce, the reduce
> function name e.g _sum, _count and then we store the aggregated value as
> the FDB value.

I think we need a better definition of <group_level> here. It’s not really the number of keys emitted from the map function, it’s the number of elements of an array key emitted by the map function that are used to determine the degree of aggregation.

Also, before we even get to group_level we should describe how reduce functions work with keys that are not arrays. In the current API this is group=true (separate aggregations for each set of KV pairs that share the same key) or group=false (one aggregated result for the entire view). Internally in the current codebase we map these two settings to group_level=exact and group_level=0, respectively. Basically, we need a way to represent “exact” in the <group_level>.

> ### Index management
> 
> To update the reduce indexes, we will rely on the `id_index` and the
> `update_seq` defined in the map discussion. Then to apply changes,  we
> calculate the change of an aggregate value for the keys at the highest
> group level, then apply that change to all the group levels lower than it
> using fdb’s atomic operations [1].

This is a really important point and worth calling out more explicitly. One reason most of the builtins are significantly simpler than the custom functions is because we know exactly how the aggregate values change when we get the map output for each updated doc in isolation. We don’t need to go retrieve all the rest of the map output that happens to share the same map key. Moreover, the change to the aggregate value can be applied using atomic operations so we don’t need to do anything special to avoid txn conflicts for views where lots of documents emit the same map key.

> ### Reducer functions
> 
> The FDB’s atomic functions support all the built in reduce functions
> CouchDB supports. So we can use those as part of our red function. For the
> `_stats` reduce function, we will have to split that across multiple key
> values. So its data model will have an extra key in it to record what stat
> it is for the _stats reducer:
> 
> {?DATABASE, ?VIEWS, ?VIEW_SIGNATURE, ?VIEWS, <view_id>, ?REDUCE,
> <group_level>, <group_key>, <_reduce_function_name>, <_stat_field>} ->
> <aggregrate_value>}
> 
> We do have some problems, with `_approx_count_distinct`  because it does
> not support removing keys from the filter. So we have three options:
> 
> 1. We can ignore key removal entirely in the filter since this is just an
> estimate
> 2.  Implement a real COUNT DISTINCT function, we can do because we’re not
> trying to merge results from different local shards
> 3. Don’t support it going forward

There’s actually a fourth option here, which is to maintain _approx_count_distinct using the future machinery for custom reduce functions, generating a new filter from the raw map output at the lowest level when keys are removed. I don’t have an informed opinion yet about that approach.

> ### Group_level=0
> 
> One tricker situation is if a user does a group_level=0 query with a key
> range, this would require us to do some client level aggregation. We would
> have to get the aggregate values for a `group_level=1` for the supplied key
> range and then aggregate those values together.

This isn’t unique to group_level=0, it could happen for any group_level!=exact. It can also require descending more than one group_level depending on the precision of the supplied key range; e.g. for a date-based key like [YYYY, MM, DD]; someone could ask for yearly aggregations with group_level=1 but supply a startkey and endkey with specific dates rather than months.

Good start! Cheers, Adam